SlideShare a Scribd company logo
1 of 46
KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association
1 Institute of Applied Informatics and Formal Description Methods (AIFB), Karlsruhe, Germany
www.kit.edu
A language-independent method for the extraction of
RDF verbalization templates
Basil Ell,1 Andreas Harth1
8th International Natural Language Generation Conference
20 June 2014, Philadelphia, PA, USA
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
2
Motivation
More and more data openly available as RDF
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Linked Open Data initiative
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
3
Motivation
More and more data openly available as RDF
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Linked Open Data initiative
Search Engine
keywords,
questions,
etc.
Text
NLG
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
4
Motivation
More and more data openly available as RDF
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Linked Open Data initiative
Search Engine
keywords,
questions,
etc.
Text
NLG
Encyclopedia or
Google Knowledge
Graph
Textual description
of a thing
NLG
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
5
Example RDF data - Triples
Subject Predicate Object
dbr:Curtain_(Novel) dbo:author dbr:Agatha_Christie
dbr:Curtain_(Novel) rdf:type dbo:Book
dbr:Curtain_(Novel) rdfs:label "Curtain (novel)"@en
dbr:Curtain_(Novel) dbp:releaseDate "September 1975"@en
dbr:Curtain_(Novel) rdf:type dbo:Writer
dbr:Curtain_(Novel) rdfs:label "Agatha Christie"@en
dbo:Book rdfs:label "book"@en
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
6
Example RDF data - Graph
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
7
Overview
Motivation
RDF Verbalization Templates
Automatic Template Extraction
Evaluation
Related Work
Summary
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
8
RDF VERBALIZATION TEMPLATES
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
9
RDF Verbalization Template (1/2)
Graph pattern
(GP)
Sentence pattern
(SP)
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
10
RDF Verbalization Template (1/2)
Graph pattern
(GP)
Sentence pattern
(SP)
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
11
RDF Verbalization Template (2/2)
GP represented
as SPARQL
query
SELECT
?book_label
?book_type_label
?author_label
?book_rD
WHERE {
?book dbo:author ?author .
?book dbp:releaseDate ?book_rD .
?book rdf:type ?book_type .
?book_type rdfs:label ?book_type_label .
?book rdfs:label ?book_label .
?author rdfs:label ?author_label .
?author rdf:type dbo:Writer .
}
book_label = “Curtain (novel)"
book_type_label = "book"
author_label = "Agatha Christie"
book_rD = "September 1975"
Curtain is a book by Agatha
Christie published in
September 1975.
Query
results
Verbalization
result
Subject Predicate Object
dbr:Curtain_(Novel) dbo:author dbr:Agatha_Christie
dbr:Curtain_(Novel) rdf:type dbo:Book
dbr:Curtain_(Novel) rdfs:label "Curtain (novel)"@en
dbr:Curtain_(Novel) dbp:releaseDate "September 1975"@en
dbr:Agatha_Christie rdf:type dbo:Writer
dbr:Agatha_Christie rdfs:label "Agatha Christie"@en
dbo:Book rdfs:label "book"@en
RDF data
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
12
AUTOMATIC TEMPLATE EXTRACTION
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
13
Template Extraction (1/6) - Overview
Parallel text-data corpus RDF verbalization templates
1. Sentence Collection
2. Text-Data Alignment
3. Abstraction
4. Grouping
5. Pattern Mining
6. Template Creation
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
14
Template Extraction (1/6) - Overview
Parallel text-data corpus RDF verbalization templates
1. Sentence Collection
2. Text-Data Alignment
3. Abstraction
4. Grouping
5. Pattern Mining
6. Template Creation
Experiment:
Text from Wikipedia
Data from DBpedia
10 Virtual Machines
8 vCPUs
8GB RAM
40GB Disk
Extraction ran for 2 weeks
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
15
Template Extraction (2/6) - Features
Distant-supervised
No hand-labeled training data required
Simultaneus multi-relation learning
Simultaneously learning all relations in a sentence
Frequent maximal subgraph pattern mining
Identify commonalities among RDF graph patterns
Language independent
Does not rely on syntactic parsing
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
16
Example Template (1/2)
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
17
Example Template (2/2)
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
18
Template Extraction (3/6) - Alignment
label
Sentencem1
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
19
Template Extraction (3/6) - Alignment
label
Sentencem1
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
20
Template Extraction (3/6) - Alignment
label
Sentencem1 m2 m3
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
21
Template Extraction (3/6) - Alignment
label
Sentencem1 m2 m3
i
i
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
22
Template Extraction (3/6) - Alignment
label
label
Sentencem1
m4
m2 m3
i
i
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
23
Template Extraction (3/6) - Alignment
label
label
Sentencem1
m4
m2 m3
i
i
i
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
24
Template Extraction (3/6) - Alignment
label
label
Sentencem1
m4
m2 m3
i
i
i
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
25
Template Extraction (3/6) - Alignment
label
label
Sentencem1
m4
m2 m3
m5
i
i
i label
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
26
Template Extraction (3/6) - Alignment
label
label
Sentencem1
m4
m2 m3
m5
i
i
i label
i
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
27
Template Extraction (3/6) - Alignment
label
label
Sentencem1
m4
m2 m3
m5
i
i
i label
i
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
28
Template Extraction (3/6) - Alignment
label
label
Sentencem1
m4
m2 m3
m5
i
i
i label
i
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string
Language independent approach:
-> no syntactic parsing
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
29
Template Extraction (4/6) – Abstraction
Abstraction 1:
Abstraction 2:
Hypothesis graph pattern 1
Hypothesis graph pattern 2
pattern 1
pattern 2
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
30
Template Extraction (5/6) - Grouping
'"{V1}" is a short story by {V2}.':
abstraction-64451-1
abstraction-88393-1
abstraction-4732-1
abstraction-50480-1
'"{V1}" is a single by American {V9} {V4} {V8}.':
abstraction-22205-1
abstraction-22205-3
abstraction-72533-1
abstraction-127891-2
'{V1} (born {V2}) is a German footballer.':
abstraction-86372-1
abstraction-86415-1
abstraction-135340-5
abstraction-140464-2
Hypothesis graph patterns with
equivalent sentence pattern
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Group graph patterns with equivalent sentence patterns:
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
31
Template Extraction (6/6) - fmSpan
fmSpan - Frequent maximal subgraph pattern
mining
Input:
Set of graph patterns
Minimal coverage value: c
Output: Set of graph patterns
Each graph pattern
Is subgraph to at least c graph patterns (→ frequent)
Cannot be extended while maintaining coverage (→ maximal)
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
32
EVALUATION
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
33
Evaluation (1/4) - Experiment
88,708,622 triples
4,004,478 English documents
716,049 German documents
3,811,992 English sentences
794,040 German sentences
3,434,108 abstracted English sentences
530,766 abstracted German sentences
(with at least two identified entities)
#groups≥5 #templates #all groups
en 4569 3816 686,687
de 2130 1250 269,551
Parallel text-data corpus:
( , )
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
34
Evaluation (2/4) - Coverage
0
50
100
150
200
250
300
350
#en
#de
How often can a
template be
applied?
About 300 templates where each template can be used
to verbalize between 10,000 and 100,000 subgraphs.
1–10
10–100
100–1000
1000–10,000
10,000–100,000
100,000–1,000,000
1,000,000–10,000,000
10,000,000–100,000,000
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
35
Evaluation (3/4)
0
50
100
150
200
(1) (2) (3) (4)
Accuracy (1)
en de
0
5
10
15
20
(1) (2) (3) (4)
Accuracy (2)
en de
Is everything that is
expressed in the graph
pattern also expressed in
the sentence pattern?
Is everything that is
expressed in the
sentence pattern also
expressed in the graph
pattern?
Measured for each triple pattern within the GP:
(1) The triple pattern is explicitly expressed
(2) The triple pattern is implied
(3) The triple pattern is not expressed
(4) Unsure
(1) Everything is expressed
(2) Most things are expressed
(3) Some things are expressed
(4) Nothing is expressed
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
10 English templates, 10 German templates,
6 evaluators, 200 verbalizationsUser study
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
36
Evaluation (4/4)
0
50
100
150
200
250
(1) (2) (3) (4)
Syntactical Correctness
en de
0
50
100
150
200
250
300
(1) (2) (3) (4) (5)
Understandability
en de
How syntactically
correct are
verbalizations?
How
understandable are
verbalizations?
(1) Completely syntactically correct
(2) Almost syntactically correct
(3) Some syntactical errors
(4) Strongly syntactically incorrect
(1) The meaning is clear
(2) The meaning is clear, but there are some problems
in word usage, and/or style
(3) The basic thrust is clear, but the evaluator is not
sure of some detailed parts because of word usage
problems.
(4) Contains many word usage problems, and the
evaluator can only guess at the meaning
(5) Cannot be understood at all
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
37
RELATED WORK
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
38
Related Work (1/4)
(Welty et al., 2010)
Focus on IE
Input sentences are parsed
Regard relations between proper nouns only
Does not consider a graph of relations
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
39
Related Work (2/4)
(Duma and Klein, 2013)
Focus on NLG
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
40
Related Work (3/4)
(Gerber and Ngomo, 2011)
Focus on IE
< ’s acquisition of > pattern for property subsidiary
“Google’s acquisition of Youtube comes as online
video is really starting to hit its stride.”
relation expressed by string between entities
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
41
Related Work (4/4)
Distant supervision
(Craven and Kumlien, 1999), (Bunescu and Mooney,
2007), (Carlson et al., 2009), (Mintz et al., 2009), (Welty
et al., 2010), (Hoffmann et al., 2011), (Surdeanu et al.,
2012)
Simultaneus multi-relation learning
(Carlson et al., 2009)
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
42
SUMMARY
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
43
Summary
Introduced RDF verbalization templates
Introduced template extraction approach
Distant-supervised
Language independent
Simultaneous multi-relation learning
Frequent maximal subgraph pattern mining
Evaluation
Large parallel text-data corpus for en and de
Good syntactical correctness & understandability
Accuracy needs to be improved in future work
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
44
Thank you for your attention!
The authors acknowledge the support of the European Commission's Seventh Framework Programme
FP7-ICT-2011-7 (XLike, Grant 288342).
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
http://km.aifb.kit.edu/sites/bridge-patterns/INLG2014/
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
45
References (1/2)
Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In
Annual meeting-association for Computational Linguistics, volume 45, pages 576–583.
Andrew Carlson, Justin Betteridge, Estevam R Hruschka Jr, and Tom M Mitchell. 2009. Coupling semi-supervised learning
of categories and relations. In Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for
Natural Language Processing, pages 1–9. Association for Computational Linguistics.
Mark Craven and Johan Kumlien. 1999. Constructing biological knowledge bases by extracting information from text
sources. In Thomas Lengauer, Reinhard Schneider, Peer Bork, Douglas L. Brutlag, Janice I. Glasgow, Hans-Werner
Mewes, and Ralf Zimmer, editors, ISMB, pages 77–86. AAAI.
Daniel Duma and Ewan Klein, 2013. Generating Natural Language from Linked Data: Unsupervised template extraction,
pages 83–94. Association for Computational Linguistics, Potsdam, Germany.
Daniel Gerber and A-C Ngonga Ngomo. 2011. Bootstrapping the linked data web. In 1st Workshop on Web Scale
Knowledge Extraction @ International Semantic Web Conference, volume 2011.
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S Weld. 2011. Knowledge-based weak
supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies - Volume 1, pages 541–550. Association for
Computational Linguistics.
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Institute of Applied Informatics and Formal Description
Metthods (AIFB)
46
References (2/2)
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled
data. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint
Conference on Natural Language Processing of the AFNLP: Volume 2 - ACL-IJCNLP 09, pages 1003–1011.
Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D Manning. 2012. Multi-instance multi-label learning
for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning, pages 455–465. Association for Computational Linguistics.
Chris Welty, James Fan, David Gondek, and Andrew Schlaikjer. 2010. Large scale relation detection. In Proceedings of the
NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 24–
33. Association for Computational Linguistics.
Basil Ell - A language-independent method for the extraction of RDF verbalization templates

More Related Content

What's hot

The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
 
Owl web ontology language
Owl  web ontology languageOwl  web ontology language
Owl web ontology languagehassco2011
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02eswcsummerschool
 
RDA Presentation
RDA PresentationRDA Presentation
RDA Presentationjendibbern
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - FactforgeEuropean Data Forum
 
Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic WebOscar Corcho
 
Sparq lreference 1.8-us
Sparq lreference 1.8-usSparq lreference 1.8-us
Sparq lreference 1.8-usAjay Ohri
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseKnowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseAndrea Nuzzolese
 
RDA and Editing Bibliographic Records
RDA and Editing Bibliographic RecordsRDA and Editing Bibliographic Records
RDA and Editing Bibliographic RecordsShana McDanold
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
 
Cataloging with RDA - Western New York Library Resources Council
Cataloging with RDA - Western New York Library Resources CouncilCataloging with RDA - Western New York Library Resources Council
Cataloging with RDA - Western New York Library Resources CouncilEmily Nimsakont
 

What's hot (20)

DSA - 2012 - Conclusion
DSA - 2012 - ConclusionDSA - 2012 - Conclusion
DSA - 2012 - Conclusion
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)
 
NCompass Live: Cataloging with RDA
NCompass Live: Cataloging with RDANCompass Live: Cataloging with RDA
NCompass Live: Cataloging with RDA
 
The Web Ontology Language
The Web Ontology LanguageThe Web Ontology Language
The Web Ontology Language
 
Owl web ontology language
Owl  web ontology languageOwl  web ontology language
Owl web ontology language
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
RDA Presentation
RDA PresentationRDA Presentation
RDA Presentation
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - Factforge
 
Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic Web
 
Sparq lreference 1.8-us
Sparq lreference 1.8-usSparq lreference 1.8-us
Sparq lreference 1.8-us
 
Introducing RDA
Introducing RDAIntroducing RDA
Introducing RDA
 
Semantic web Technology
Semantic web TechnologySemantic web Technology
Semantic web Technology
 
Introducing RDA
Introducing RDAIntroducing RDA
Introducing RDA
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseKnowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuse
 
RDA and Editing Bibliographic Records
RDA and Editing Bibliographic RecordsRDA and Editing Bibliographic Records
RDA and Editing Bibliographic Records
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
 
Cataloging with RDA - Western New York Library Resources Council
Cataloging with RDA - Western New York Library Resources CouncilCataloging with RDA - Western New York Library Resources Council
Cataloging with RDA - Western New York Library Resources Council
 
Fact forge20 edf
Fact forge20 edfFact forge20 edf
Fact forge20 edf
 
Oke
OkeOke
Oke
 
cldr_overview
cldr_overviewcldr_overview
cldr_overview
 

Similar to A language-independent method for the extraction of RDF verbalization templateslization - ppt spli-t

SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesBasil Ell
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialMuhammad Saleem
 
NCBO SPARQL Endpoint
NCBO SPARQL EndpointNCBO SPARQL Endpoint
NCBO SPARQL EndpointTrish Whetzel
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technologySteve Ray
 
Extracting Authoring Information Based on Keywords andSemant.docx
Extracting Authoring Information Based on Keywords andSemant.docxExtracting Authoring Information Based on Keywords andSemant.docx
Extracting Authoring Information Based on Keywords andSemant.docxmydrynan
 
Comparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHPComparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHPMSGUNC
 
Using Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jUsing Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jNeo4j
 
A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processinglucianb
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationMuhammad Saleem
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translateIJwest
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashupsgiurca
 

Similar to A language-independent method for the extraction of RDF verbalization templateslization - ppt spli-t (20)

SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
 
Sparql
SparqlSparql
Sparql
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
NCBO SPARQL Endpoint
NCBO SPARQL EndpointNCBO SPARQL Endpoint
NCBO SPARQL Endpoint
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technology
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
 
Extracting Authoring Information Based on Keywords andSemant.docx
Extracting Authoring Information Based on Keywords andSemant.docxExtracting Authoring Information Based on Keywords andSemant.docx
Extracting Authoring Information Based on Keywords andSemant.docx
 
Comparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHPComparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHP
 
xAPI Vocabulary - Improving Semantic Interoperability of Controlled Vocabularies
xAPI Vocabulary - Improving Semantic Interoperability of Controlled VocabulariesxAPI Vocabulary - Improving Semantic Interoperability of Controlled Vocabularies
xAPI Vocabulary - Improving Semantic Interoperability of Controlled Vocabularies
 
Using Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jUsing Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4j
 
A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processing
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translate
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
.Net and Rdf APIs
.Net and Rdf APIs.Net and Rdf APIs
.Net and Rdf APIs
 
KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 

A language-independent method for the extraction of RDF verbalization templateslization - ppt spli-t

  • 1. KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association 1 Institute of Applied Informatics and Formal Description Methods (AIFB), Karlsruhe, Germany www.kit.edu A language-independent method for the extraction of RDF verbalization templates Basil Ell,1 Andreas Harth1 8th International Natural Language Generation Conference 20 June 2014, Philadelphia, PA, USA
  • 2. Institute of Applied Informatics and Formal Description Metthods (AIFB) 2 Motivation More and more data openly available as RDF Basil Ell - A language-independent method for the extraction of RDF verbalization templates Linked Open Data initiative
  • 3. Institute of Applied Informatics and Formal Description Metthods (AIFB) 3 Motivation More and more data openly available as RDF Basil Ell - A language-independent method for the extraction of RDF verbalization templates Linked Open Data initiative Search Engine keywords, questions, etc. Text NLG
  • 4. Institute of Applied Informatics and Formal Description Metthods (AIFB) 4 Motivation More and more data openly available as RDF Basil Ell - A language-independent method for the extraction of RDF verbalization templates Linked Open Data initiative Search Engine keywords, questions, etc. Text NLG Encyclopedia or Google Knowledge Graph Textual description of a thing NLG
  • 5. Institute of Applied Informatics and Formal Description Metthods (AIFB) 5 Example RDF data - Triples Subject Predicate Object dbr:Curtain_(Novel) dbo:author dbr:Agatha_Christie dbr:Curtain_(Novel) rdf:type dbo:Book dbr:Curtain_(Novel) rdfs:label "Curtain (novel)"@en dbr:Curtain_(Novel) dbp:releaseDate "September 1975"@en dbr:Curtain_(Novel) rdf:type dbo:Writer dbr:Curtain_(Novel) rdfs:label "Agatha Christie"@en dbo:Book rdfs:label "book"@en Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 6. Institute of Applied Informatics and Formal Description Metthods (AIFB) 6 Example RDF data - Graph Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 7. Institute of Applied Informatics and Formal Description Metthods (AIFB) 7 Overview Motivation RDF Verbalization Templates Automatic Template Extraction Evaluation Related Work Summary Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 8. Institute of Applied Informatics and Formal Description Metthods (AIFB) 8 RDF VERBALIZATION TEMPLATES Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 9. Institute of Applied Informatics and Formal Description Metthods (AIFB) 9 RDF Verbalization Template (1/2) Graph pattern (GP) Sentence pattern (SP) Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 10. Institute of Applied Informatics and Formal Description Metthods (AIFB) 10 RDF Verbalization Template (1/2) Graph pattern (GP) Sentence pattern (SP) Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 11. Institute of Applied Informatics and Formal Description Metthods (AIFB) 11 RDF Verbalization Template (2/2) GP represented as SPARQL query SELECT ?book_label ?book_type_label ?author_label ?book_rD WHERE { ?book dbo:author ?author . ?book dbp:releaseDate ?book_rD . ?book rdf:type ?book_type . ?book_type rdfs:label ?book_type_label . ?book rdfs:label ?book_label . ?author rdfs:label ?author_label . ?author rdf:type dbo:Writer . } book_label = “Curtain (novel)" book_type_label = "book" author_label = "Agatha Christie" book_rD = "September 1975" Curtain is a book by Agatha Christie published in September 1975. Query results Verbalization result Subject Predicate Object dbr:Curtain_(Novel) dbo:author dbr:Agatha_Christie dbr:Curtain_(Novel) rdf:type dbo:Book dbr:Curtain_(Novel) rdfs:label "Curtain (novel)"@en dbr:Curtain_(Novel) dbp:releaseDate "September 1975"@en dbr:Agatha_Christie rdf:type dbo:Writer dbr:Agatha_Christie rdfs:label "Agatha Christie"@en dbo:Book rdfs:label "book"@en RDF data Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 12. Institute of Applied Informatics and Formal Description Metthods (AIFB) 12 AUTOMATIC TEMPLATE EXTRACTION Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 13. Institute of Applied Informatics and Formal Description Metthods (AIFB) 13 Template Extraction (1/6) - Overview Parallel text-data corpus RDF verbalization templates 1. Sentence Collection 2. Text-Data Alignment 3. Abstraction 4. Grouping 5. Pattern Mining 6. Template Creation Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 14. Institute of Applied Informatics and Formal Description Metthods (AIFB) 14 Template Extraction (1/6) - Overview Parallel text-data corpus RDF verbalization templates 1. Sentence Collection 2. Text-Data Alignment 3. Abstraction 4. Grouping 5. Pattern Mining 6. Template Creation Experiment: Text from Wikipedia Data from DBpedia 10 Virtual Machines 8 vCPUs 8GB RAM 40GB Disk Extraction ran for 2 weeks Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 15. Institute of Applied Informatics and Formal Description Metthods (AIFB) 15 Template Extraction (2/6) - Features Distant-supervised No hand-labeled training data required Simultaneus multi-relation learning Simultaneously learning all relations in a sentence Frequent maximal subgraph pattern mining Identify commonalities among RDF graph patterns Language independent Does not rely on syntactic parsing Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 16. Institute of Applied Informatics and Formal Description Metthods (AIFB) 16 Example Template (1/2) Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 17. Institute of Applied Informatics and Formal Description Metthods (AIFB) 17 Example Template (2/2) Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 18. Institute of Applied Informatics and Formal Description Metthods (AIFB) 18 Template Extraction (3/6) - Alignment label Sentencem1 i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 19. Institute of Applied Informatics and Formal Description Metthods (AIFB) 19 Template Extraction (3/6) - Alignment label Sentencem1 i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 20. Institute of Applied Informatics and Formal Description Metthods (AIFB) 20 Template Extraction (3/6) - Alignment label Sentencem1 m2 m3 i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 21. Institute of Applied Informatics and Formal Description Metthods (AIFB) 21 Template Extraction (3/6) - Alignment label Sentencem1 m2 m3 i i i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 22. Institute of Applied Informatics and Formal Description Metthods (AIFB) 22 Template Extraction (3/6) - Alignment label label Sentencem1 m4 m2 m3 i i i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 23. Institute of Applied Informatics and Formal Description Metthods (AIFB) 23 Template Extraction (3/6) - Alignment label label Sentencem1 m4 m2 m3 i i i i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 24. Institute of Applied Informatics and Formal Description Metthods (AIFB) 24 Template Extraction (3/6) - Alignment label label Sentencem1 m4 m2 m3 i i i i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 25. Institute of Applied Informatics and Formal Description Metthods (AIFB) 25 Template Extraction (3/6) - Alignment label label Sentencem1 m4 m2 m3 m5 i i i label i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 26. Institute of Applied Informatics and Formal Description Metthods (AIFB) 26 Template Extraction (3/6) - Alignment label label Sentencem1 m4 m2 m3 m5 i i i label i i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 27. Institute of Applied Informatics and Formal Description Metthods (AIFB) 27 Template Extraction (3/6) - Alignment label label Sentencem1 m4 m2 m3 m5 i i i label i i entity literal i i identified entity identified literal m1 modifier matched string Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 28. Institute of Applied Informatics and Formal Description Metthods (AIFB) 28 Template Extraction (3/6) - Alignment label label Sentencem1 m4 m2 m3 m5 i i i label i i entity literal i i identified entity identified literal m1 modifier matched string Language independent approach: -> no syntactic parsing Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 29. Institute of Applied Informatics and Formal Description Metthods (AIFB) 29 Template Extraction (4/6) – Abstraction Abstraction 1: Abstraction 2: Hypothesis graph pattern 1 Hypothesis graph pattern 2 pattern 1 pattern 2 Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 30. Institute of Applied Informatics and Formal Description Metthods (AIFB) 30 Template Extraction (5/6) - Grouping '"{V1}" is a short story by {V2}.': abstraction-64451-1 abstraction-88393-1 abstraction-4732-1 abstraction-50480-1 '"{V1}" is a single by American {V9} {V4} {V8}.': abstraction-22205-1 abstraction-22205-3 abstraction-72533-1 abstraction-127891-2 '{V1} (born {V2}) is a German footballer.': abstraction-86372-1 abstraction-86415-1 abstraction-135340-5 abstraction-140464-2 Hypothesis graph patterns with equivalent sentence pattern Basil Ell - A language-independent method for the extraction of RDF verbalization templates Group graph patterns with equivalent sentence patterns:
  • 31. Institute of Applied Informatics and Formal Description Metthods (AIFB) 31 Template Extraction (6/6) - fmSpan fmSpan - Frequent maximal subgraph pattern mining Input: Set of graph patterns Minimal coverage value: c Output: Set of graph patterns Each graph pattern Is subgraph to at least c graph patterns (→ frequent) Cannot be extended while maintaining coverage (→ maximal) Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 32. Institute of Applied Informatics and Formal Description Metthods (AIFB) 32 EVALUATION Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 33. Institute of Applied Informatics and Formal Description Metthods (AIFB) 33 Evaluation (1/4) - Experiment 88,708,622 triples 4,004,478 English documents 716,049 German documents 3,811,992 English sentences 794,040 German sentences 3,434,108 abstracted English sentences 530,766 abstracted German sentences (with at least two identified entities) #groups≥5 #templates #all groups en 4569 3816 686,687 de 2130 1250 269,551 Parallel text-data corpus: ( , ) Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 34. Institute of Applied Informatics and Formal Description Metthods (AIFB) 34 Evaluation (2/4) - Coverage 0 50 100 150 200 250 300 350 #en #de How often can a template be applied? About 300 templates where each template can be used to verbalize between 10,000 and 100,000 subgraphs. 1–10 10–100 100–1000 1000–10,000 10,000–100,000 100,000–1,000,000 1,000,000–10,000,000 10,000,000–100,000,000 Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 35. Institute of Applied Informatics and Formal Description Metthods (AIFB) 35 Evaluation (3/4) 0 50 100 150 200 (1) (2) (3) (4) Accuracy (1) en de 0 5 10 15 20 (1) (2) (3) (4) Accuracy (2) en de Is everything that is expressed in the graph pattern also expressed in the sentence pattern? Is everything that is expressed in the sentence pattern also expressed in the graph pattern? Measured for each triple pattern within the GP: (1) The triple pattern is explicitly expressed (2) The triple pattern is implied (3) The triple pattern is not expressed (4) Unsure (1) Everything is expressed (2) Most things are expressed (3) Some things are expressed (4) Nothing is expressed Basil Ell - A language-independent method for the extraction of RDF verbalization templates 10 English templates, 10 German templates, 6 evaluators, 200 verbalizationsUser study
  • 36. Institute of Applied Informatics and Formal Description Metthods (AIFB) 36 Evaluation (4/4) 0 50 100 150 200 250 (1) (2) (3) (4) Syntactical Correctness en de 0 50 100 150 200 250 300 (1) (2) (3) (4) (5) Understandability en de How syntactically correct are verbalizations? How understandable are verbalizations? (1) Completely syntactically correct (2) Almost syntactically correct (3) Some syntactical errors (4) Strongly syntactically incorrect (1) The meaning is clear (2) The meaning is clear, but there are some problems in word usage, and/or style (3) The basic thrust is clear, but the evaluator is not sure of some detailed parts because of word usage problems. (4) Contains many word usage problems, and the evaluator can only guess at the meaning (5) Cannot be understood at all Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 37. Institute of Applied Informatics and Formal Description Metthods (AIFB) 37 RELATED WORK Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 38. Institute of Applied Informatics and Formal Description Metthods (AIFB) 38 Related Work (1/4) (Welty et al., 2010) Focus on IE Input sentences are parsed Regard relations between proper nouns only Does not consider a graph of relations Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 39. Institute of Applied Informatics and Formal Description Metthods (AIFB) 39 Related Work (2/4) (Duma and Klein, 2013) Focus on NLG Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 40. Institute of Applied Informatics and Formal Description Metthods (AIFB) 40 Related Work (3/4) (Gerber and Ngomo, 2011) Focus on IE < ’s acquisition of > pattern for property subsidiary “Google’s acquisition of Youtube comes as online video is really starting to hit its stride.” relation expressed by string between entities Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 41. Institute of Applied Informatics and Formal Description Metthods (AIFB) 41 Related Work (4/4) Distant supervision (Craven and Kumlien, 1999), (Bunescu and Mooney, 2007), (Carlson et al., 2009), (Mintz et al., 2009), (Welty et al., 2010), (Hoffmann et al., 2011), (Surdeanu et al., 2012) Simultaneus multi-relation learning (Carlson et al., 2009)
  • 42. Institute of Applied Informatics and Formal Description Metthods (AIFB) 42 SUMMARY Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 43. Institute of Applied Informatics and Formal Description Metthods (AIFB) 43 Summary Introduced RDF verbalization templates Introduced template extraction approach Distant-supervised Language independent Simultaneous multi-relation learning Frequent maximal subgraph pattern mining Evaluation Large parallel text-data corpus for en and de Good syntactical correctness & understandability Accuracy needs to be improved in future work Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 44. Institute of Applied Informatics and Formal Description Metthods (AIFB) 44 Thank you for your attention! The authors acknowledge the support of the European Commission's Seventh Framework Programme FP7-ICT-2011-7 (XLike, Grant 288342). Basil Ell - A language-independent method for the extraction of RDF verbalization templates http://km.aifb.kit.edu/sites/bridge-patterns/INLG2014/
  • 45. Institute of Applied Informatics and Formal Description Metthods (AIFB) 45 References (1/2) Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In Annual meeting-association for Computational Linguistics, volume 45, pages 576–583. Andrew Carlson, Justin Betteridge, Estevam R Hruschka Jr, and Tom M Mitchell. 2009. Coupling semi-supervised learning of categories and relations. In Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, pages 1–9. Association for Computational Linguistics. Mark Craven and Johan Kumlien. 1999. Constructing biological knowledge bases by extracting information from text sources. In Thomas Lengauer, Reinhard Schneider, Peer Bork, Douglas L. Brutlag, Janice I. Glasgow, Hans-Werner Mewes, and Ralf Zimmer, editors, ISMB, pages 77–86. AAAI. Daniel Duma and Ewan Klein, 2013. Generating Natural Language from Linked Data: Unsupervised template extraction, pages 83–94. Association for Computational Linguistics, Potsdam, Germany. Daniel Gerber and A-C Ngonga Ngomo. 2011. Bootstrapping the linked data web. In 1st Workshop on Web Scale Knowledge Extraction @ International Semantic Web Conference, volume 2011. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, pages 541–550. Association for Computational Linguistics. Basil Ell - A language-independent method for the extraction of RDF verbalization templates
  • 46. Institute of Applied Informatics and Formal Description Metthods (AIFB) 46 References (2/2) Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - ACL-IJCNLP 09, pages 1003–1011. Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 455–465. Association for Computational Linguistics. Chris Welty, James Fan, David Gondek, and Andrew Schlaikjer. 2010. Large scale relation detection. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 24– 33. Association for Computational Linguistics. Basil Ell - A language-independent method for the extraction of RDF verbalization templates

Editor's Notes

  1. Mention: Paper rather technical (algorithms, formalizations). Presentation is not that technical – tries to convey the main ideas of the approach. Mention: Website with additional material (data samples, evaluation material) related to the publication: http://km.aifb.kit.edu/sites/bridge-patterns/INLG2014/
  2. Talk about: entities, SPO, binary relationships, URIs
  3. PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX dbo: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?book_label ?book_type_label ?author_label ?book_rD WHERE { ?book dbo:author ?author . ?book dbp:releaseDate ?book_rD . ?book rdf:type ?book_type . ?book_type rdfs:label ?book_type_label . ?book rdfs:label ?book_label . ?author rdfs:label ?author_label . ?author rdf:type dbo:Writer . } http://dbpedia.org/sparql
  4. Mention: graph more complex than sentence -> e.g., that a person is alive (category living people) is implied by present tense instead of past tense Also: redundancies in the vocabulary
  5. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  6. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  7. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  8. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  9. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  10. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  11. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  12. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  13. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  14. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  15. Explain identification and modifiers. Only entities are matched, not relations. They‘ll be identified when comparing graph patterns.
  16. What about overlapping matches? Why do i create individual hypothesis graph patterns?
  17. Mention: * Experts in English, German, SPARQL * How were templates selected? -> randomly, different complexities, material online
  18. Done.
  19. Done.