SlideShare a Scribd company logo
LODifier: Generating Linked Data from
Unstructured Text
Isabelle Augenstein1 Sebastian Pad´o1 Sebastian Rudolph2
1Department of Computational Linguistics, Universit¨at Heidelberg, Germany
2Institute AIFB, Karlsruhe Institute of Technology, Germany
{augenste,pado}@cl.uni-heidelberg.de, rudolph@kit.edu
Extended Semantic Web Conference, 2012
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 1 / 28
Outline
1 Motivation
2 The LODifier Architecture and Workflow
3 Evaluation in a Document Similarity Task
4 Conclusion
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 2 / 28
Introduction
Introduction
What?
Represent information from NL text as Linked Data
Creation of graph-based representation for unstructured plain text
Why?
Task is easy for structured text, but quite challenging for unstructured
text
Most other approaches are very selective, e.g., extract only
pre-defined relations
Our approach translates the text in its entirety
Possible use cases: Domain-independent scenarios in which no
pre-defined schema is available
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 3 / 28
Introduction
Introduction
How?
Use noise-tolerant NLP pipeline to analyze text
Translate result into RDF representation
Embed output in the LOD cloud by using vocabulary from DBPedia
and Wordnet
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 4 / 28
System Overview Pipeline
http:
//ww
w.w3
.org/
RDF
/icon
s/rdf
_flye
r.svg
1 of 1
08.12
.11
23:48
lemmatized
text
NE-marked
text
parsed text
word sense
URIs
parsed text
DRS
tokenized
text
DBpedia
URIs
LOD-linked
RDF
unstructured
text
Named Entity
Recognition
(Wikifier)
Parsing
(C&C)
Deep Semantic
Analysis
(Boxer)
Lemmatization
Word Sense
Disambiguation
(UKB)
Tokenization
RDF Generation and Integration
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 5 / 28
System NLP analysis steps
NLP analysis steps (NER)
Named Entity Recognition (Wikifier 1):
Named entities are recognized using Wikifier
Wikifier finds Wikipedia links for named entities
Wikipedia URLs are converted to DBPedia URIs
The New York Times reported that John McCarthy died.
He invented the programming language LISP.
Figure: Test sentences
[[The New York Times]] reported that [[John McCarthy (computer
scientist)|John McCarthy]] died. He invented the [[Programming
language|programming language]] [[Lisp (programming language)|
Lisp]].
Figure: Wikifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 6 / 28
System NLP analysis steps
NLP analysis steps (WSD)
Word Sense Disambiguation (UKB 2)):
Words are disambiguated using UKB, an unsupervised graph-based
WSD tool
UKB finds WordNet links for word senses
UKB output is converted to RDF WordNet URIs
ctx_0 w1 00965687-v !! report
ctx_0 w4 00358431-v !! die
ctx_1 w7 01634424-v !! invent
ctx_1 w9 06898352-n !! programming_language
ctx_1 w10 06901936-n !! lisp
Figure: UKB output for the test sentences.
2
http://ixa2.si.ehu.es/ukb/
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 7 / 28
System NLP analysis steps
NLP analysis steps (Parsing and Semantic Analysis)
Parsing (C&C 3):
Sentences are parsed with the C&C parser
The C&C parser is a statistical parser that uses the combined
categorial grammar (CCG)
CCG is a semantically motivated grammar, making it suitable for
further semantic analysis
Deep Semantic Analysis (Boxer 4):
Boxer builds on the output of C&C and produces discourse
representation structures (DRSs)
DRSs model the meaning of texts in terms of the relevant entities
(discourse referents) and relations between them (conditions)
3
http://svn.ask.it.usyd.edu.au/trac/candc/
4
http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 8 / 28
System NLP analysis steps
______________________________ _______________________
| x0 x1 x2 x3 | | x4 x5 x6 |
|..............................| |.......................|
(| male(x0) |+| event(x4) |)
| named(x0,john_mccarthy,per) | | invent(x4) |
| programming_language(x1) | | agent(x4,x0) |
| nn(x1,x2) | | patient(x4,x2) |
| named(x2,lisp,nam) | | event(x5) |
| named(x3,new_york_times,org) | | report(x5) |
|______________________________| | agent(x5,x3) |
| theme(x5,x6) |
| proposition(x6) |
| ______________ |
| | x7 | |
| x6:|..............| |
| | event(x7) | |
| | die(x7) | |
| | agent(x7,x0) | |
| |______________| |
|_______________________|
Figure: Boxer output for the example sentences
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 9 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 10 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 11 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 12 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 13 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 14 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 15 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 16 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 17 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 18 / 28
Evaluation Evaluation overview
Evaluation overview
Task:
Assess document similarity
Story link detection task, part of topic detection and tracking (TDT)
family of tasks
Data:
TDT-2 benchmark dataset 5
Consists of newspaper, TV and radio news in English, Arabic and
Mandarin
Subset: English language, only newspaper articles
183 positive and negative document pairs
5
http://projects.ldc.upenn.edu/TDT2/
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 19 / 28
Evaluation Evaluation overview
Evaluation overview
Method:
Define various document similarity measures
Measures without structural information
Measures with structural information
Documents are predicted to have the same topic if they have a
similarity of θ or more
θ is determined with supervised learning
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 20 / 28
Evaluation Evaluation method
Evaluation method
Similarity measures without structure:
Random baseline
Bag-of-words baseline: Word overlap between the two documents in
a document pair
Bag-of-URI baseline: URI overlap between two RDF documents
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 21 / 28
Evaluation Evaluation method
Evaluation method
URI class variants:
Three variants for URI baseline: Take the relative importance of
various URI classes into account
Variant 1: All NEs identified by Wikifier, all words successfully
disambiguated by UKB (namespaces dbpedia:, wn30:)
Variant 2: Adds all NEs recognized by Boxer (namespace ne:)
Variant 3: Further adds the URIs for all words that were not
recognized by Wikifier, UKB or Boxer (namespace class:)
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 22 / 28
Evaluation Evaluation method
Evaluation method
Extended setting:
Aims at drawing more information from LOD cloud into the similarity
computation
The generated RDF graph is enriched by:
DBpedia categories (namespace dbpedia:)
WordNet synsets and WordNet senses related to the URIs in the
generated graph (namespace wn30:)
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 23 / 28
Evaluation Evaluation method
Evaluation method
Similarity measures with structure:
Full-fledged graph similarity measures, e.g. based on isomorphic
subgraphs, are infeasible due to the size of the RDF graphs
Our similarity measures are based on the shortest paths between
relevant nodes
Our intuition is that short paths between URI pairs denote relevant
semantic information and we expect they are related by a short path
in other documents as well
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 24 / 28
Evaluation Evaluation method
Evaluation method
Similarity measures with structure:
proSimk,Rel,f (G1, G2) =
a,b∈Rel(G1)
a,b ∈Ck (G1)∩Ck (G2)
f ( (a, b))
a,b∈Rel(G1)
a,b ∈Ck (G1)
f ( (a, b))
k: path length, Rel: semantic relations, f : similarity function
proSimcnt: Uses f ( ) = 1, i.e., just counts the number of paths
irrespective of their length
proSimlen: Uses f ( ) = 1/ , giving less weight to longer paths
proSimsqlen: Uses f ( ) = 1/
√
, discounting long paths less
aggressively than proSimlen
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 25 / 28
Evaluation Results
Results
Model normal extended
Similarity measures without structure
Random Baseline 50.0 –
Bag of Words 63.0 –
Bag of URIs (Variant 3) 73.4 76.4
Similarity measures with structure
proSimcnt (k=6, Variant 1) 77.7 77.6
proSimcnt (k=6, Variant 2) 79.2 79.0
proSimcnt (k=8, Variant 3) 82.1 81.9
proSimlen (k=6, Variant 3) 81.5 81.4
proSimsqlen (k=8, Variant 3) 81.1 80.9
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 26 / 28
Summary
Summary
Main contributions:
Service for the LOD community that can be used in different
scenarios that can profit from structural representation of text
Combination of well-established concepts and systems in a deep
semantic pipeline
Linking existing NLP technologies to the LOD cloud
Evaluation of LODifier in a topic link detection task gives clear
evidence of its potential
Possible future work directions:
Instead of a pipeline approach, simultaneously consider information
from the NLP components to allow interaction between them
Test LODifier in different scenarios
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 27 / 28
Summary
Thank you!
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 28 / 28

More Related Content

What's hot

Distributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data ManagementDistributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data Management
OlafGoerlitz
 
Named Entity Recognition from Online News
Named Entity Recognition from Online NewsNamed Entity Recognition from Online News
Named Entity Recognition from Online News
Bernardo Najlis
 
Triple Stores
Triple StoresTriple Stores
Triple Stores
Stephan Volmer
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Dr.-Ing. Thomas Hartmann
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
Chris Mungall
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Olaf Hartig
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
WU (Vienna University of Economics and Business)
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Olaf Hartig
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
LeeFeigenbaum
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
Olaf Hartig
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLPRobert Viseur
 
ShEx vs SHACL
ShEx vs SHACLShEx vs SHACL
ShEx vs SHACL
Jose Emilio Labra Gayo
 
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...
JohannWanja
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
Rrubaa Panchendrarajan
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...European Data Forum
 
RDF Constraint Checking using RDF Data Descriptions (RDD)
RDF Constraint Checking using RDF Data Descriptions (RDD)RDF Constraint Checking using RDF Data Descriptions (RDD)
RDF Constraint Checking using RDF Data Descriptions (RDD)
Alexander Schätzle
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
PlanetData Network of Excellence
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
Lidia Pivovarova
 
Quantifying RDF data sets
Quantifying RDF data setsQuantifying RDF data sets
Quantifying RDF data sets
Janos Hajagos
 
How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...
Richard Minerich
 

What's hot (20)

Distributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data ManagementDistributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data Management
 
Named Entity Recognition from Online News
Named Entity Recognition from Online NewsNamed Entity Recognition from Online News
Named Entity Recognition from Online News
 
Triple Stores
Triple StoresTriple Stores
Triple Stores
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
ShEx vs SHACL
ShEx vs SHACLShEx vs SHACL
ShEx vs SHACL
 
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
 
RDF Constraint Checking using RDF Data Descriptions (RDD)
RDF Constraint Checking using RDF Data Descriptions (RDD)RDF Constraint Checking using RDF Data Descriptions (RDD)
RDF Constraint Checking using RDF Data Descriptions (RDD)
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
 
Quantifying RDF data sets
Quantifying RDF data setsQuantifying RDF data sets
Quantifying RDF data sets
 
How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...
 

Viewers also liked

Deploying RDF Linked Data via Virtuoso Universal Server
Deploying RDF Linked Data via Virtuoso Universal ServerDeploying RDF Linked Data via Virtuoso Universal Server
Deploying RDF Linked Data via Virtuoso Universal Serverrumito
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
Isabelle Augenstein
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
Isabelle Augenstein
 
Seed Selection for Distantly Supervised Web-Based Relation Extraction
Seed Selection for Distantly Supervised Web-Based Relation ExtractionSeed Selection for Distantly Supervised Web-Based Relation Extraction
Seed Selection for Distantly Supervised Web-Based Relation Extraction
Isabelle Augenstein
 
Relation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionRelation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant Supervision
Isabelle Augenstein
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation Learning
Isabelle Augenstein
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
Peter Mika
 
Human Neural Machine
Human Neural MachineHuman Neural Machine
Human Neural Machine
Georgios Spithourakis
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
Andre Freitas
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
Isabelle Augenstein
 
Natural Language Processing for the Semantic Web
Natural Language Processing for the Semantic WebNatural Language Processing for the Semantic Web
Natural Language Processing for the Semantic Web
Isabelle Augenstein
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
Marina Santini
 
Semantic Search Over The Web
Semantic Search Over The WebSemantic Search Over The Web
Semantic Search Over The Web
alierkan
 
Management information system question and answers
Management information system question and answersManagement information system question and answers
Management information system question and answerspradeep acharya
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
Sujit Pal
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
Traian Rebedea
 
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Glen Cathey
 
Web 3.0 The Semantic Web
Web 3.0 The Semantic WebWeb 3.0 The Semantic Web
Web 3.0 The Semantic Web
Hatem Mahmoud
 

Viewers also liked (19)

Deploying RDF Linked Data via Virtuoso Universal Server
Deploying RDF Linked Data via Virtuoso Universal ServerDeploying RDF Linked Data via Virtuoso Universal Server
Deploying RDF Linked Data via Virtuoso Universal Server
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
Seed Selection for Distantly Supervised Web-Based Relation Extraction
Seed Selection for Distantly Supervised Web-Based Relation ExtractionSeed Selection for Distantly Supervised Web-Based Relation Extraction
Seed Selection for Distantly Supervised Web-Based Relation Extraction
 
Relation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionRelation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant Supervision
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation Learning
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
Mapping Keywords to
Mapping Keywords to Mapping Keywords to
Mapping Keywords to
 
Human Neural Machine
Human Neural MachineHuman Neural Machine
Human Neural Machine
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Natural Language Processing for the Semantic Web
Natural Language Processing for the Semantic WebNatural Language Processing for the Semantic Web
Natural Language Processing for the Semantic Web
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
Semantic Search Over The Web
Semantic Search Over The WebSemantic Search Over The Web
Semantic Search Over The Web
 
Management information system question and answers
Management information system question and answersManagement information system question and answers
Management information system question and answers
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
 
Web 3.0 The Semantic Web
Web 3.0 The Semantic WebWeb 3.0 The Semantic Web
Web 3.0 The Semantic Web
 

Similar to Lodifier: Generating Linked Data from Unstructured Text

Optimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the webOptimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the web
Mahdi Atawneh
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Christophe Debruyne
 
2014.12 - Let's Disco - 2 (EDDI 2014)
2014.12 - Let's Disco - 2 (EDDI 2014)2014.12 - Let's Disco - 2 (EDDI 2014)
2014.12 - Let's Disco - 2 (EDDI 2014)
Dr.-Ing. Thomas Hartmann
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial data
Roberto García
 
DRESD Project Presentation - December 2006
DRESD Project Presentation - December 2006DRESD Project Presentation - December 2006
DRESD Project Presentation - December 2006
santa
 
LOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack ArchitectureLOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2 Creating Knowledge out of Interlinked Data
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
Jure Leskovec
 
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Jose Emilio Labra Gayo
 
Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic Data
Steffen Staab
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012
Vincent Michel
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
Mariano Rodriguez-Muro
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
Laura Po
 
Linked Data Visualization Model - KEG VŠE
Linked Data Visualization Model - KEG VŠELinked Data Visualization Model - KEG VŠE
Linked Data Visualization Model - KEG VŠE
Jiří Helmich
 
Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016
Martin Necasky
 
Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255
deffa5
 
Ontologies Ontop Databases
Ontologies Ontop DatabasesOntologies Ontop Databases
Ontologies Ontop Databases
Martín Rezk
 
Multilingual qa
Multilingual qaMultilingual qa
Multilingual qa
shakimov
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
Jose Emilio Labra Gayo
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Ontotext
 
Not-So-Linked Solution to the Linked Data Mining Challenge 2016
Not-So-Linked Solution to the Linked Data Mining Challenge 2016Not-So-Linked Solution to the Linked Data Mining Challenge 2016
Not-So-Linked Solution to the Linked Data Mining Challenge 2016
Jędrzej Potoniec
 

Similar to Lodifier: Generating Linked Data from Unstructured Text (20)

Optimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the webOptimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the web
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
 
2014.12 - Let's Disco - 2 (EDDI 2014)
2014.12 - Let's Disco - 2 (EDDI 2014)2014.12 - Let's Disco - 2 (EDDI 2014)
2014.12 - Let's Disco - 2 (EDDI 2014)
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial data
 
DRESD Project Presentation - December 2006
DRESD Project Presentation - December 2006DRESD Project Presentation - December 2006
DRESD Project Presentation - December 2006
 
LOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack ArchitectureLOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack Architecture
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
 
Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic Data
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Linked Data Visualization Model - KEG VŠE
Linked Data Visualization Model - KEG VŠELinked Data Visualization Model - KEG VŠE
Linked Data Visualization Model - KEG VŠE
 
Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016
 
Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255
 
Ontologies Ontop Databases
Ontologies Ontop DatabasesOntologies Ontop Databases
Ontologies Ontop Databases
 
Multilingual qa
Multilingual qaMultilingual qa
Multilingual qa
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Not-So-Linked Solution to the Linked Data Mining Challenge 2016
Not-So-Linked Solution to the Linked Data Mining Challenge 2016Not-So-Linked Solution to the Linked Data Mining Challenge 2016
Not-So-Linked Solution to the Linked Data Mining Challenge 2016
 

More from Isabelle Augenstein

Beyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationBeyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific Communication
Isabelle Augenstein
 
Automatically Detecting Scientific Misinformation
Automatically Detecting Scientific MisinformationAutomatically Detecting Scientific Misinformation
Automatically Detecting Scientific Misinformation
Isabelle Augenstein
 
Accountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact CheckingAccountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact Checking
Isabelle Augenstein
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
Isabelle Augenstein
 
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
Isabelle Augenstein
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
Isabelle Augenstein
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
Isabelle Augenstein
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information Online
Isabelle Augenstein
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
Isabelle Augenstein
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Isabelle Augenstein
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
Isabelle Augenstein
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checking
Isabelle Augenstein
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
Isabelle Augenstein
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
Isabelle Augenstein
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
Isabelle Augenstein
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Isabelle Augenstein
 
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Isabelle Augenstein
 

More from Isabelle Augenstein (17)

Beyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationBeyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific Communication
 
Automatically Detecting Scientific Misinformation
Automatically Detecting Scientific MisinformationAutomatically Detecting Scientific Misinformation
Automatically Detecting Scientific Misinformation
 
Accountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact CheckingAccountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact Checking
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information Online
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checking
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Lodifier: Generating Linked Data from Unstructured Text

  • 1. LODifier: Generating Linked Data from Unstructured Text Isabelle Augenstein1 Sebastian Pad´o1 Sebastian Rudolph2 1Department of Computational Linguistics, Universit¨at Heidelberg, Germany 2Institute AIFB, Karlsruhe Institute of Technology, Germany {augenste,pado}@cl.uni-heidelberg.de, rudolph@kit.edu Extended Semantic Web Conference, 2012 Augenstein, Pad´o, Rudolph LODifier ESWC 2012 1 / 28
  • 2. Outline 1 Motivation 2 The LODifier Architecture and Workflow 3 Evaluation in a Document Similarity Task 4 Conclusion Augenstein, Pad´o, Rudolph LODifier ESWC 2012 2 / 28
  • 3. Introduction Introduction What? Represent information from NL text as Linked Data Creation of graph-based representation for unstructured plain text Why? Task is easy for structured text, but quite challenging for unstructured text Most other approaches are very selective, e.g., extract only pre-defined relations Our approach translates the text in its entirety Possible use cases: Domain-independent scenarios in which no pre-defined schema is available Augenstein, Pad´o, Rudolph LODifier ESWC 2012 3 / 28
  • 4. Introduction Introduction How? Use noise-tolerant NLP pipeline to analyze text Translate result into RDF representation Embed output in the LOD cloud by using vocabulary from DBPedia and Wordnet Augenstein, Pad´o, Rudolph LODifier ESWC 2012 4 / 28
  • 5. System Overview Pipeline http: //ww w.w3 .org/ RDF /icon s/rdf _flye r.svg 1 of 1 08.12 .11 23:48 lemmatized text NE-marked text parsed text word sense URIs parsed text DRS tokenized text DBpedia URIs LOD-linked RDF unstructured text Named Entity Recognition (Wikifier) Parsing (C&C) Deep Semantic Analysis (Boxer) Lemmatization Word Sense Disambiguation (UKB) Tokenization RDF Generation and Integration Augenstein, Pad´o, Rudolph LODifier ESWC 2012 5 / 28
  • 6. System NLP analysis steps NLP analysis steps (NER) Named Entity Recognition (Wikifier 1): Named entities are recognized using Wikifier Wikifier finds Wikipedia links for named entities Wikipedia URLs are converted to DBPedia URIs The New York Times reported that John McCarthy died. He invented the programming language LISP. Figure: Test sentences [[The New York Times]] reported that [[John McCarthy (computer scientist)|John McCarthy]] died. He invented the [[Programming language|programming language]] [[Lisp (programming language)| Lisp]]. Figure: Wikifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 6 / 28
  • 7. System NLP analysis steps NLP analysis steps (WSD) Word Sense Disambiguation (UKB 2)): Words are disambiguated using UKB, an unsupervised graph-based WSD tool UKB finds WordNet links for word senses UKB output is converted to RDF WordNet URIs ctx_0 w1 00965687-v !! report ctx_0 w4 00358431-v !! die ctx_1 w7 01634424-v !! invent ctx_1 w9 06898352-n !! programming_language ctx_1 w10 06901936-n !! lisp Figure: UKB output for the test sentences. 2 http://ixa2.si.ehu.es/ukb/ Augenstein, Pad´o, Rudolph LODifier ESWC 2012 7 / 28
  • 8. System NLP analysis steps NLP analysis steps (Parsing and Semantic Analysis) Parsing (C&C 3): Sentences are parsed with the C&C parser The C&C parser is a statistical parser that uses the combined categorial grammar (CCG) CCG is a semantically motivated grammar, making it suitable for further semantic analysis Deep Semantic Analysis (Boxer 4): Boxer builds on the output of C&C and produces discourse representation structures (DRSs) DRSs model the meaning of texts in terms of the relevant entities (discourse referents) and relations between them (conditions) 3 http://svn.ask.it.usyd.edu.au/trac/candc/ 4 http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer Augenstein, Pad´o, Rudolph LODifier ESWC 2012 8 / 28
  • 9. System NLP analysis steps ______________________________ _______________________ | x0 x1 x2 x3 | | x4 x5 x6 | |..............................| |.......................| (| male(x0) |+| event(x4) |) | named(x0,john_mccarthy,per) | | invent(x4) | | programming_language(x1) | | agent(x4,x0) | | nn(x1,x2) | | patient(x4,x2) | | named(x2,lisp,nam) | | event(x5) | | named(x3,new_york_times,org) | | report(x5) | |______________________________| | agent(x5,x3) | | theme(x5,x6) | | proposition(x6) | | ______________ | | | x7 | | | x6:|..............| | | | event(x7) | | | | die(x7) | | | | agent(x7,x0) | | | |______________| | |_______________________| Figure: Boxer output for the example sentences Augenstein, Pad´o, Rudolph LODifier ESWC 2012 9 / 28
  • 10. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 10 / 28
  • 11. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 11 / 28
  • 12. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 12 / 28
  • 13. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 13 / 28
  • 14. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 14 / 28
  • 15. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 15 / 28
  • 16. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 16 / 28
  • 17. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 17 / 28
  • 18. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 18 / 28
  • 19. Evaluation Evaluation overview Evaluation overview Task: Assess document similarity Story link detection task, part of topic detection and tracking (TDT) family of tasks Data: TDT-2 benchmark dataset 5 Consists of newspaper, TV and radio news in English, Arabic and Mandarin Subset: English language, only newspaper articles 183 positive and negative document pairs 5 http://projects.ldc.upenn.edu/TDT2/ Augenstein, Pad´o, Rudolph LODifier ESWC 2012 19 / 28
  • 20. Evaluation Evaluation overview Evaluation overview Method: Define various document similarity measures Measures without structural information Measures with structural information Documents are predicted to have the same topic if they have a similarity of θ or more θ is determined with supervised learning Augenstein, Pad´o, Rudolph LODifier ESWC 2012 20 / 28
  • 21. Evaluation Evaluation method Evaluation method Similarity measures without structure: Random baseline Bag-of-words baseline: Word overlap between the two documents in a document pair Bag-of-URI baseline: URI overlap between two RDF documents Augenstein, Pad´o, Rudolph LODifier ESWC 2012 21 / 28
  • 22. Evaluation Evaluation method Evaluation method URI class variants: Three variants for URI baseline: Take the relative importance of various URI classes into account Variant 1: All NEs identified by Wikifier, all words successfully disambiguated by UKB (namespaces dbpedia:, wn30:) Variant 2: Adds all NEs recognized by Boxer (namespace ne:) Variant 3: Further adds the URIs for all words that were not recognized by Wikifier, UKB or Boxer (namespace class:) Augenstein, Pad´o, Rudolph LODifier ESWC 2012 22 / 28
  • 23. Evaluation Evaluation method Evaluation method Extended setting: Aims at drawing more information from LOD cloud into the similarity computation The generated RDF graph is enriched by: DBpedia categories (namespace dbpedia:) WordNet synsets and WordNet senses related to the URIs in the generated graph (namespace wn30:) Augenstein, Pad´o, Rudolph LODifier ESWC 2012 23 / 28
  • 24. Evaluation Evaluation method Evaluation method Similarity measures with structure: Full-fledged graph similarity measures, e.g. based on isomorphic subgraphs, are infeasible due to the size of the RDF graphs Our similarity measures are based on the shortest paths between relevant nodes Our intuition is that short paths between URI pairs denote relevant semantic information and we expect they are related by a short path in other documents as well Augenstein, Pad´o, Rudolph LODifier ESWC 2012 24 / 28
  • 25. Evaluation Evaluation method Evaluation method Similarity measures with structure: proSimk,Rel,f (G1, G2) = a,b∈Rel(G1) a,b ∈Ck (G1)∩Ck (G2) f ( (a, b)) a,b∈Rel(G1) a,b ∈Ck (G1) f ( (a, b)) k: path length, Rel: semantic relations, f : similarity function proSimcnt: Uses f ( ) = 1, i.e., just counts the number of paths irrespective of their length proSimlen: Uses f ( ) = 1/ , giving less weight to longer paths proSimsqlen: Uses f ( ) = 1/ √ , discounting long paths less aggressively than proSimlen Augenstein, Pad´o, Rudolph LODifier ESWC 2012 25 / 28
  • 26. Evaluation Results Results Model normal extended Similarity measures without structure Random Baseline 50.0 – Bag of Words 63.0 – Bag of URIs (Variant 3) 73.4 76.4 Similarity measures with structure proSimcnt (k=6, Variant 1) 77.7 77.6 proSimcnt (k=6, Variant 2) 79.2 79.0 proSimcnt (k=8, Variant 3) 82.1 81.9 proSimlen (k=6, Variant 3) 81.5 81.4 proSimsqlen (k=8, Variant 3) 81.1 80.9 Augenstein, Pad´o, Rudolph LODifier ESWC 2012 26 / 28
  • 27. Summary Summary Main contributions: Service for the LOD community that can be used in different scenarios that can profit from structural representation of text Combination of well-established concepts and systems in a deep semantic pipeline Linking existing NLP technologies to the LOD cloud Evaluation of LODifier in a topic link detection task gives clear evidence of its potential Possible future work directions: Instead of a pipeline approach, simultaneously consider information from the NLP components to allow interaction between them Test LODifier in different scenarios Augenstein, Pad´o, Rudolph LODifier ESWC 2012 27 / 28
  • 28. Summary Thank you! Augenstein, Pad´o, Rudolph LODifier ESWC 2012 28 / 28