SlideShare a Scribd company logo
Creating Knowledge out of Interlinked Data
http://lod2.eu

ISWC – 2013/10/23 – Page 1

Integrating NLP using Linked Data
Sebastian Hellmann, Jens Lehmann, Sören Auer and Martin Brümmer

http://slideshare.net/kurzum
http://nlp2rdf.org
http://lod2.eu

LOD2 Presentation . 02.09.2010 . Page

AKSW, Universität Leipzig

http://lod2.eu
ISWC – 2013/10/23 – Page 2

Introduction

http://lod2.eu
ISWC – 2013/10/23 – Page 3

Introduction

Core problems in integrating NLP:
1. Too much heterogeneity
2. Almost no open standards available
3. Lack of open collaboration
4. Difficult and large domain

http://lod2.eu
ISWC – 2013/10/23 – Page 4

Problem analysis
Hardly any reusability in NLP
• Free software (as in free beer), but no open licenses
• Few standards and few mappings
• Integration is hard-wired (you have to write software)
– for each tool, for each framework
Main benefits of using RDF, OWL and Linked Data are:
• lower entry barrier (as a client / user)
• easy data integration (linking, mapping)
• reusability of tools and conceptualisations (ontologies)
• off-the-shelf solutions for common tasks

http://lod2.eu
ISWC – 2013/10/23 – Page 5

The Semantic Gap

http://lod2.eu
ISWC – 2013/10/23 – Page 6

http://lod2.eu
ISWC – 2013/10/23 – Page 7

NLP2RDF project
NLP2RDF (http://nlp2rdf.org)
- community project bootstrapped by LOD2
- develops NLP Interchange Format (NIF)
- umbrella project to combine (and consolidate) existing work

http://lod2.eu
ISWC – 2013/10/23 – Page 8

NIF Overview
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.
→ to create an eco-system of interopable web services

http://lod2.eu
ISWC – 2013/10/23 – Page 9

http://lod2.eu

NIF Overview
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.

•

Reuse of existing standards such as RDF, OWL2, the PROV Ontology, LAF (ISO
24612), Unicode and RFC 5147

•

Standardize access parameters, annotations (e.g. tokenization), validation
and log messages

•

Reuse of existing ontologies:
ISWC – 2013/10/23 – Page 10

http://lod2.eu

Example NIF Workflow

NIF workflow, however, can obviously not provide any better performance (Fmeasure, speed) than a properly configured UIMA or GATE pipeline with the same
components.
ISWC – 2013/10/23 – Page 11

Use Cases
•
•
•

Internationalization TagSet 2.0
Part of Speech Tagging
Wikifier API access via RDFaCE (Entity Linking)

http://lod2.eu
ISWC – 2013/10/23 – Page 12

http://lod2.eu

UC1 - Internationalisation Tagset 2.0

•

NIF will be the recommended RDF conversion of the Internationalisation
Tagset 2.0 of W3C (ITS 2.0) - http://www.w3.org/TR/its20/

•

NIF turns out to have a unique selling proposition regarding NLP and RDF

•

There were no suitable alternative RDF vocabulary for this conversion
available.
ISWC – 2013/10/23 – Page 13

Source: http://www.w3.org/TR/its20/#EX-HTML-whitespace-normalization

http://lod2.eu

ITS 2.0

RDFa parsers loose all provenance information:
<http://examples.com/books/wikinomics> dc:title ''Wikinomics'' .

Source: https://en.wikipedia.org/wiki/RDFa
ISWC – 2013/10/23 – Page 14

UC1 - Internationalisation Tagset 2.0

http://lod2.eu
ISWC – 2013/10/23 – Page 15

UC1 - Internationalisation Tagset 2.0

String offset based on:
- Unicode NFC, code points
- ISO 24612
- RFC 5147

http://lod2.eu
http://lod2.eu

ISWC – 2013/10/23 – Page 16

UC2 – Part of Speech Tagging

Please see the paper:

http://purl.org/olia
ISWC – 2013/10/23 – Page 17

UC3 – Wikifier API access via RDFaCE

https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki

http://lod2.eu
ISWC – 2013/10/23 – Page 18

UC3 - Wikifier API access via RDFaCE
http://rdface.aksw.org/

http://lod2.eu
ISWC – 2013/10/23 – Page 19

UC3 - Wikifier API access via RDFaCE
http://rdface.aksw.org/

http://lod2.eu
ISWC – 2013/10/23 – Page 20

Evaluation
Please see the paper!
1) Quantitative Analysis with Google Wikilinks Corpus as NIF RDF
• Crawl of 3 million web sites, 40 million Wikipedia links
• ~ 477 million triples in NIF
2) Questionnaire and Developers Study for NIF 1.0
• NIF 1.0 was released in September 2009
• Over 30 known implementations (22 not from authors)
• 14 developers participated in the study
• Minimal NIF implementation requires less than 500 LoC
3) Qualitative Comparison with other Frameworks and Formats

http://lod2.eu
ISWC – 2013/10/23 – Page 21

State of NIF 2.0
Corpora as Linked Data
• Wikilinks corpus - http://wiki-link.nlp2rdf.org
• KORE 50 - http://www.yovisto.com/labs/ner-benchmarks/
• DBpedia Spotlight dataset
Tools
• entityclassifier.eu – http://entityclassifier.eu
• Spotlight - http://spotlight.dbpedia.org
• Open NLP
• Stanford CoreNLP - https://github.com/NLP2RDF/software
• Validator - https://github.com/NLP2RDF/software

http://lod2.eu
ISWC – 2013/10/23 – Page 22

State of NIF 2.0
•
•
•

Rollout is in progress
Distributed implementation at different speed and quality
Software lifecycle:
• Implementation
• Testing/Validation
• Integration in the main software
• Deployment as a web service

•

Hosted web services often not up to date while code base is

http://lod2.eu
ISWC – 2013/10/23 – Page 23

How to join - http://nlp2rdf.org

http://lod2.eu
ISWC – 2013/10/23 – Page 24

For ontology creators
NLP2RDF provides infrastructure for your NLP ontologies

•
•
•
•
•
•

Redundant, persistent hosting
Maven packages
Code and documentation generation
Continuous Integration (planned)
Indexing
Validation of instance data

Please write to me or the mailing list
nlp2rdf@lists.informatik.uni-leipzig.de

http://lod2.eu
http://lod2.eu

ISWC – 2013/10/23 – Page 25

Take home message
•

Early industrial uptake
• OpenLink, Vistatech.ie, Zemanta, Tenforce, Unister
• ITS 2.0 W3C standard was driven by localization industry

•
•

NIF is open and free (CC0 planned)
NIF is designed to be a cost-saver

Not primarily aimed at
increasing features or
performance (F-Measure)
ISWC – 2013/10/23 – Page 26

Thanks for your attention
Open Community – All feedback is welcome!
http://slideshare.net/kurzum
Websites:
http://nlp2rdf.org
http://lod2.eu

http://lod2.eu
ISWC – 2013/10/23 – Page 27

Annotations

http://lod2.eu
ISWC – 2013/10/23 – Page 28

NIF

http://lod2.eu
ISWC – 2013/10/23 – Page 29

Scalability - Salzburg Research KMT

https://bitbucket.org/srfgkmt/stanbol-nlp

http://lod2.eu
ISWC – 2013/10/23 – Page 30

Unicode Normal Form C

•
•

Recommendation for RDF Literals
http://unicode.org/reports/tr15/#Norm_Forms

http://lod2.eu
ISWC – 2013/10/23 – Page 31

Tokenization

Christian Chiarcos, Julia Ritz, Manfred Stede: By all these lovely tokens... Merging conflicting tokenizations.
Language Resources and Evaluation 46(1): 53-74 (2012)

http://lod2.eu
http://lod2.eu

ISWC – 2013/10/23 – Page 32

Validation over specification

•
•
•
•
•
•

SPARQL queries produce (find) errors

http://persistence.uni-leipzig.org/nlp2rdf/ontologies/testcase/lib/nif-2.0-suite.t
RLOG – An RDF Logging Ontology
./validate.jar -i nif-erroneous-model.ttl -t file
Demo → character count
Demo → all errors

ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
ISWC – 2013/10/23 – Page 33

NIF

Demo:
http://nlp2rdf.lod2.eu/demo.php

http://lod2.eu
ISWC – 2013/10/23 – Page 34

OLiA

http://purl.org/olia

http://lod2.eu
ISWC – 2013/10/23 – Page 35

NIF

http://lod2.eu
ISWC – 2013/10/23 – Page 36

NIF

http://lod2.eu

More Related Content

What's hot (6)

LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 

Similar to Integrating NLP using Linked Data

NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportSebastian Hellmann
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23Sebastian Hellmann
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Sergio Fernández
 
Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationSebastian Hellmann
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711STIinnsbruck
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationSebastian Hellmann
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikisSören Auer
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...Sebastian Hellmann
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...semanticsconference
 
Cloud open unveillithium-odlnewrelease-2-ns
Cloud open unveillithium-odlnewrelease-2-nsCloud open unveillithium-odlnewrelease-2-ns
Cloud open unveillithium-odlnewrelease-2-nsNEC Corporation
 
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Itaapy
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeSören Auer
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishBruno Cornec
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkSebastian Hellmann
 

Similar to Integrating NLP using Linked Data (20)

NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate report
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)
 
NIF 2.0 draft for Pisa
NIF 2.0 draft for PisaNIF 2.0 draft for Pisa
NIF 2.0 draft for Pisa
 
Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and Segmentation
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web Annotation
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
Cloud open unveillithium-odlnewrelease-2-ns
Cloud open unveillithium-odlnewrelease-2-nsCloud open unveillithium-odlnewrelease-2-ns
Cloud open unveillithium-odlnewrelease-2-ns
 
OOoCon Lpod
OOoCon LpodOOoCon Lpod
OOoCon Lpod
 
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
OpenDaylight nluug_november
OpenDaylight nluug_novemberOpenDaylight nluug_november
OpenDaylight nluug_november
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future Work
 

More from Sebastian Hellmann

DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015Sebastian Hellmann
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by ExampleSebastian Hellmann
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftSebastian Hellmann
 

More from Sebastian Hellmann (10)

KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by Example
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Tool collection as linkeddata
Tool collection as linkeddataTool collection as linkeddata
Tool collection as linkeddata
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 

Recently uploaded

Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Celine George
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTechSoup
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Mohamed Rizk Khodair
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesRased Khan
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resourcesaileywriter
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePedroFerreira53928
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxjmorse8
 
Advances in production technology of Grapes.pdf
Advances in production technology of Grapes.pdfAdvances in production technology of Grapes.pdf
Advances in production technology of Grapes.pdfDr. M. Kumaresan Hort.
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesashishpaul799
 
IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff17thcssbs2
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Celine George
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfQucHHunhnh
 
Open Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPointOpen Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPointELaRue0
 
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdfPost Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdfPragya - UEM Kolkata Quiz Club
 

Recently uploaded (20)

Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17
 
B.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdfB.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdf
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
 
NCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfNCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdf
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resources
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptx
 
Advances in production technology of Grapes.pdf
Advances in production technology of Grapes.pdfAdvances in production technology of Grapes.pdf
Advances in production technology of Grapes.pdf
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
 
IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
Open Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPointOpen Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPoint
 
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdfPost Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
 

Integrating NLP using Linked Data

  • 1. Creating Knowledge out of Interlinked Data http://lod2.eu ISWC – 2013/10/23 – Page 1 Integrating NLP using Linked Data Sebastian Hellmann, Jens Lehmann, Sören Auer and Martin Brümmer http://slideshare.net/kurzum http://nlp2rdf.org http://lod2.eu LOD2 Presentation . 02.09.2010 . Page AKSW, Universität Leipzig http://lod2.eu
  • 2. ISWC – 2013/10/23 – Page 2 Introduction http://lod2.eu
  • 3. ISWC – 2013/10/23 – Page 3 Introduction Core problems in integrating NLP: 1. Too much heterogeneity 2. Almost no open standards available 3. Lack of open collaboration 4. Difficult and large domain http://lod2.eu
  • 4. ISWC – 2013/10/23 – Page 4 Problem analysis Hardly any reusability in NLP • Free software (as in free beer), but no open licenses • Few standards and few mappings • Integration is hard-wired (you have to write software) – for each tool, for each framework Main benefits of using RDF, OWL and Linked Data are: • lower entry barrier (as a client / user) • easy data integration (linking, mapping) • reusability of tools and conceptualisations (ontologies) • off-the-shelf solutions for common tasks http://lod2.eu
  • 5. ISWC – 2013/10/23 – Page 5 The Semantic Gap http://lod2.eu
  • 6. ISWC – 2013/10/23 – Page 6 http://lod2.eu
  • 7. ISWC – 2013/10/23 – Page 7 NLP2RDF project NLP2RDF (http://nlp2rdf.org) - community project bootstrapped by LOD2 - develops NLP Interchange Format (NIF) - umbrella project to combine (and consolidate) existing work http://lod2.eu
  • 8. ISWC – 2013/10/23 – Page 8 NIF Overview The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. → to create an eco-system of interopable web services http://lod2.eu
  • 9. ISWC – 2013/10/23 – Page 9 http://lod2.eu NIF Overview The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • Reuse of existing standards such as RDF, OWL2, the PROV Ontology, LAF (ISO 24612), Unicode and RFC 5147 • Standardize access parameters, annotations (e.g. tokenization), validation and log messages • Reuse of existing ontologies:
  • 10. ISWC – 2013/10/23 – Page 10 http://lod2.eu Example NIF Workflow NIF workflow, however, can obviously not provide any better performance (Fmeasure, speed) than a properly configured UIMA or GATE pipeline with the same components.
  • 11. ISWC – 2013/10/23 – Page 11 Use Cases • • • Internationalization TagSet 2.0 Part of Speech Tagging Wikifier API access via RDFaCE (Entity Linking) http://lod2.eu
  • 12. ISWC – 2013/10/23 – Page 12 http://lod2.eu UC1 - Internationalisation Tagset 2.0 • NIF will be the recommended RDF conversion of the Internationalisation Tagset 2.0 of W3C (ITS 2.0) - http://www.w3.org/TR/its20/ • NIF turns out to have a unique selling proposition regarding NLP and RDF • There were no suitable alternative RDF vocabulary for this conversion available.
  • 13. ISWC – 2013/10/23 – Page 13 Source: http://www.w3.org/TR/its20/#EX-HTML-whitespace-normalization http://lod2.eu ITS 2.0 RDFa parsers loose all provenance information: <http://examples.com/books/wikinomics> dc:title ''Wikinomics'' . Source: https://en.wikipedia.org/wiki/RDFa
  • 14. ISWC – 2013/10/23 – Page 14 UC1 - Internationalisation Tagset 2.0 http://lod2.eu
  • 15. ISWC – 2013/10/23 – Page 15 UC1 - Internationalisation Tagset 2.0 String offset based on: - Unicode NFC, code points - ISO 24612 - RFC 5147 http://lod2.eu
  • 16. http://lod2.eu ISWC – 2013/10/23 – Page 16 UC2 – Part of Speech Tagging Please see the paper: http://purl.org/olia
  • 17. ISWC – 2013/10/23 – Page 17 UC3 – Wikifier API access via RDFaCE https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki http://lod2.eu
  • 18. ISWC – 2013/10/23 – Page 18 UC3 - Wikifier API access via RDFaCE http://rdface.aksw.org/ http://lod2.eu
  • 19. ISWC – 2013/10/23 – Page 19 UC3 - Wikifier API access via RDFaCE http://rdface.aksw.org/ http://lod2.eu
  • 20. ISWC – 2013/10/23 – Page 20 Evaluation Please see the paper! 1) Quantitative Analysis with Google Wikilinks Corpus as NIF RDF • Crawl of 3 million web sites, 40 million Wikipedia links • ~ 477 million triples in NIF 2) Questionnaire and Developers Study for NIF 1.0 • NIF 1.0 was released in September 2009 • Over 30 known implementations (22 not from authors) • 14 developers participated in the study • Minimal NIF implementation requires less than 500 LoC 3) Qualitative Comparison with other Frameworks and Formats http://lod2.eu
  • 21. ISWC – 2013/10/23 – Page 21 State of NIF 2.0 Corpora as Linked Data • Wikilinks corpus - http://wiki-link.nlp2rdf.org • KORE 50 - http://www.yovisto.com/labs/ner-benchmarks/ • DBpedia Spotlight dataset Tools • entityclassifier.eu – http://entityclassifier.eu • Spotlight - http://spotlight.dbpedia.org • Open NLP • Stanford CoreNLP - https://github.com/NLP2RDF/software • Validator - https://github.com/NLP2RDF/software http://lod2.eu
  • 22. ISWC – 2013/10/23 – Page 22 State of NIF 2.0 • • • Rollout is in progress Distributed implementation at different speed and quality Software lifecycle: • Implementation • Testing/Validation • Integration in the main software • Deployment as a web service • Hosted web services often not up to date while code base is http://lod2.eu
  • 23. ISWC – 2013/10/23 – Page 23 How to join - http://nlp2rdf.org http://lod2.eu
  • 24. ISWC – 2013/10/23 – Page 24 For ontology creators NLP2RDF provides infrastructure for your NLP ontologies • • • • • • Redundant, persistent hosting Maven packages Code and documentation generation Continuous Integration (planned) Indexing Validation of instance data Please write to me or the mailing list nlp2rdf@lists.informatik.uni-leipzig.de http://lod2.eu
  • 25. http://lod2.eu ISWC – 2013/10/23 – Page 25 Take home message • Early industrial uptake • OpenLink, Vistatech.ie, Zemanta, Tenforce, Unister • ITS 2.0 W3C standard was driven by localization industry • • NIF is open and free (CC0 planned) NIF is designed to be a cost-saver Not primarily aimed at increasing features or performance (F-Measure)
  • 26. ISWC – 2013/10/23 – Page 26 Thanks for your attention Open Community – All feedback is welcome! http://slideshare.net/kurzum Websites: http://nlp2rdf.org http://lod2.eu http://lod2.eu
  • 27. ISWC – 2013/10/23 – Page 27 Annotations http://lod2.eu
  • 28. ISWC – 2013/10/23 – Page 28 NIF http://lod2.eu
  • 29. ISWC – 2013/10/23 – Page 29 Scalability - Salzburg Research KMT https://bitbucket.org/srfgkmt/stanbol-nlp http://lod2.eu
  • 30. ISWC – 2013/10/23 – Page 30 Unicode Normal Form C • • Recommendation for RDF Literals http://unicode.org/reports/tr15/#Norm_Forms http://lod2.eu
  • 31. ISWC – 2013/10/23 – Page 31 Tokenization Christian Chiarcos, Julia Ritz, Manfred Stede: By all these lovely tokens... Merging conflicting tokenizations. Language Resources and Evaluation 46(1): 53-74 (2012) http://lod2.eu
  • 32. http://lod2.eu ISWC – 2013/10/23 – Page 32 Validation over specification • • • • • • SPARQL queries produce (find) errors http://persistence.uni-leipzig.org/nlp2rdf/ontologies/testcase/lib/nif-2.0-suite.t RLOG – An RDF Logging Ontology ./validate.jar -i nif-erroneous-model.ttl -t file Demo → character count Demo → all errors ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013
  • 33. ISWC – 2013/10/23 – Page 33 NIF Demo: http://nlp2rdf.lod2.eu/demo.php http://lod2.eu
  • 34. ISWC – 2013/10/23 – Page 34 OLiA http://purl.org/olia http://lod2.eu
  • 35. ISWC – 2013/10/23 – Page 35 NIF http://lod2.eu
  • 36. ISWC – 2013/10/23 – Page 36 NIF http://lod2.eu