A language-independent method for the extraction of RDF verbalization templateslization - ppt spli-t

KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association
1 Institute of Applied Informatics and Formal Description Methods (AIFB), Karlsruhe, Germany
www.kit.edu
A language-independent method for the extraction of
RDF verbalization templates
Basil Ell,1 Andreas Harth1
8th International Natural Language Generation Conference
20 June 2014, Philadelphia, PA, USA

Institute of Applied Informatics and Formal Description
Metthods (AIFB)
2
Motivation
More and more data openly available as RDF
Basil Ell - A language-independent method for the extraction of RDF verbalization templates
Linked Open Data initiative

Metthods (AIFB)
3
Motivation
Search Engine
keywords,
questions,
etc.
Text
NLG

Metthods (AIFB)
4
Motivation
Search Engine
keywords,
questions,
etc.
Text
NLG
Encyclopedia or
Google Knowledge
Graph
Textual description
of a thing
NLG

Metthods (AIFB)
5
Example RDF data - Triples
Subject Predicate Object
dbr:Curtain_(Novel) dbo:author dbr:Agatha_Christie
dbr:Curtain_(Novel) rdf:type dbo:Book
dbr:Curtain_(Novel) rdfs:label "Curtain (novel)"@en
dbr:Curtain_(Novel) dbp:releaseDate "September 1975"@en
dbr:Curtain_(Novel) rdf:type dbo:Writer
dbr:Curtain_(Novel) rdfs:label "Agatha Christie"@en
dbo:Book rdfs:label "book"@en

Metthods (AIFB)
6
Example RDF data - Graph

Metthods (AIFB)
7
Overview
Motivation
RDF Verbalization Templates
Automatic Template Extraction
Evaluation
Related Work
Summary

Metthods (AIFB)
8
RDF VERBALIZATION TEMPLATES

Metthods (AIFB)
9
RDF Verbalization Template (1/2)
Graph pattern
(GP)
Sentence pattern
(SP)

Metthods (AIFB)
10
Graph pattern
(GP)
Sentence pattern
(SP)

Metthods (AIFB)
11
GP represented
as SPARQL
query
SELECT
?book_label
?book_type_label
?author_label
?book_rD
WHERE {
?book dbo:author ?author .
?book dbp:releaseDate ?book_rD .
?book rdf:type ?book_type .
?book_type rdfs:label ?book_type_label .
?book rdfs:label ?book_label .
?author rdfs:label ?author_label .
?author rdf:type dbo:Writer .
}
book_label = “Curtain (novel)"
book_type_label = "book"
author_label = "Agatha Christie"
book_rD = "September 1975"
Curtain is a book by Agatha
Christie published in
September 1975.
Query
results
Verbalization
result
Subject Predicate Object
dbr:Curtain_(Novel) dbo:author dbr:Agatha_Christie
dbr:Curtain_(Novel) rdf:type dbo:Book
dbr:Curtain_(Novel) rdfs:label "Curtain (novel)"@en
dbr:Curtain_(Novel) dbp:releaseDate "September 1975"@en
dbr:Agatha_Christie rdf:type dbo:Writer
dbr:Agatha_Christie rdfs:label "Agatha Christie"@en
dbo:Book rdfs:label "book"@en
RDF data

Metthods (AIFB)
12
AUTOMATIC TEMPLATE EXTRACTION

Metthods (AIFB)
13
Template Extraction (1/6) - Overview
Parallel text-data corpus RDF verbalization templates
1. Sentence Collection
2. Text-Data Alignment
3. Abstraction
4. Grouping
5. Pattern Mining
6. Template Creation

Metthods (AIFB)
14
Template Extraction (1/6) - Overview
Parallel text-data corpus RDF verbalization templates
1. Sentence Collection
2. Text-Data Alignment
3. Abstraction
4. Grouping
5. Pattern Mining
6. Template Creation
Experiment:
Text from Wikipedia
Data from DBpedia
10 Virtual Machines
8 vCPUs
8GB RAM
40GB Disk
Extraction ran for 2 weeks

Metthods (AIFB)
15
Template Extraction (2/6) - Features
Distant-supervised
No hand-labeled training data required
Simultaneus multi-relation learning
Simultaneously learning all relations in a sentence
Frequent maximal subgraph pattern mining
Identify commonalities among RDF graph patterns
Language independent
Does not rely on syntactic parsing

Metthods (AIFB)
16
Example Template (1/2)

Metthods (AIFB)
17
Example Template (2/2)

Metthods (AIFB)
18
Template Extraction (3/6) - Alignment
label
Sentencem1
i
entity
literal
i
i
identified entity
identified literal
m1 modifier matched string

Metthods (AIFB)
19
label
Sentencem1
i
entity
literal
i
i
identified entity
identified literal

Metthods (AIFB)
20
label
Sentencem1 m2 m3
i
entity
literal
i
i
identified entity
identified literal

Metthods (AIFB)
21
label
Sentencem1 m2 m3
i
i
i
entity
literal
i
i
identified entity
identified literal

Metthods (AIFB)
22
label
label
Sentencem1
m4
m2 m3
i
i
i
entity
literal
i
i
identified entity
identified literal

Metthods (AIFB)
23
label
label
Sentencem1
m4
m2 m3
i
i
i
i
entity
literal
i
i
identified entity
identified literal

Metthods (AIFB)
24
label
label
Sentencem1
m4
m2 m3
i
i
i
i
entity
literal
i
i
identified entity
identified literal

Metthods (AIFB)
25
label
label
Sentencem1
m4
m2 m3
m5
i
i
i label
i
entity
literal
i
i
identified entity
identified literal

Metthods (AIFB)
26
label
label
Sentencem1
m4
m2 m3
m5
i
i
i label
i
i
entity
literal
i
i
identified entity
identified literal

Metthods (AIFB)
27
label
label
Sentencem1
m4
m2 m3
m5
i
i
i label
i
i
entity
literal
i
i
identified entity
identified literal

Metthods (AIFB)
28
label
label
Sentencem1
m4
m2 m3
m5
i
i
i label
i
i
entity
literal
i
i
identified entity
identified literal
Language independent approach:
-> no syntactic parsing

Metthods (AIFB)
29
Template Extraction (4/6) – Abstraction
Abstraction 1:
Abstraction 2:
Hypothesis graph pattern 1
Hypothesis graph pattern 2
pattern 1
pattern 2

Metthods (AIFB)
30
Template Extraction (5/6) - Grouping
'"{V1}" is a short story by {V2}.':
abstraction-64451-1
abstraction-88393-1
abstraction-4732-1
abstraction-50480-1
'"{V1}" is a single by American {V9} {V4} {V8}.':
abstraction-22205-1
abstraction-22205-3
abstraction-72533-1
abstraction-127891-2
'{V1} (born {V2}) is a German footballer.':
abstraction-86372-1
abstraction-86415-1
Hypothesis graph patterns with
equivalent sentence pattern
Group graph patterns with equivalent sentence patterns:

Metthods (AIFB)
31
Template Extraction (6/6) - fmSpan
fmSpan - Frequent maximal subgraph pattern
mining
Input:
Set of graph patterns
Minimal coverage value: c
Output: Set of graph patterns
Each graph pattern
Is subgraph to at least c graph patterns (→ frequent)
Cannot be extended while maintaining coverage (→ maximal)

Metthods (AIFB)
32
EVALUATION

Metthods (AIFB)
33
Evaluation (1/4) - Experiment
88,708,622 triples
4,004,478 English documents
716,049 German documents
3,811,992 English sentences
794,040 German sentences
3,434,108 abstracted English sentences
530,766 abstracted German sentences
(with at least two identified entities)
#groups≥5 #templates #all groups
en 4569 3816 686,687
de 2130 1250 269,551
Parallel text-data corpus:
( , )

Metthods (AIFB)
34
Evaluation (2/4) - Coverage
0
50
100
150
200
250
300
350
#en
#de
How often can a
template be
applied?
About 300 templates where each template can be used
to verbalize between 10,000 and 100,000 subgraphs.
1–10
10–100
100–1000
1000–10,000
10,000–100,000
100,000–1,000,000
1,000,000–10,000,000
10,000,000–100,000,000

Metthods (AIFB)
35
Evaluation (3/4)
0
50
100
150
200
(1) (2) (3) (4)
Accuracy (1)
en de
0
5
10
15
20
(1) (2) (3) (4)
Accuracy (2)
en de
Is everything that is
expressed in the graph
pattern also expressed in
the sentence pattern?
Is everything that is
expressed in the
sentence pattern also
expressed in the graph
pattern?
Measured for each triple pattern within the GP:
(1) The triple pattern is explicitly expressed
(2) The triple pattern is implied
(3) The triple pattern is not expressed
(4) Unsure
(1) Everything is expressed
(2) Most things are expressed
(3) Some things are expressed
(4) Nothing is expressed
10 English templates, 10 German templates,
6 evaluators, 200 verbalizationsUser study

Metthods (AIFB)
36
Evaluation (4/4)
0
50
100
150
200
250
(1) (2) (3) (4)
Syntactical Correctness
en de
0
50
100
150
200
250
300
(1) (2) (3) (4) (5)
Understandability
en de
How syntactically
correct are
verbalizations?
How
understandable are
verbalizations?
(1) Completely syntactically correct
(2) Almost syntactically correct
(3) Some syntactical errors
(4) Strongly syntactically incorrect
(1) The meaning is clear
(2) The meaning is clear, but there are some problems
in word usage, and/or style
(3) The basic thrust is clear, but the evaluator is not
sure of some detailed parts because of word usage
problems.
(4) Contains many word usage problems, and the
evaluator can only guess at the meaning
(5) Cannot be understood at all

Metthods (AIFB)
37
RELATED WORK

Metthods (AIFB)
38
Related Work (1/4)
(Welty et al., 2010)
Focus on IE
Input sentences are parsed
Regard relations between proper nouns only
Does not consider a graph of relations

Metthods (AIFB)
39
Related Work (2/4)
(Duma and Klein, 2013)
Focus on NLG

Metthods (AIFB)
40
Related Work (3/4)
(Gerber and Ngomo, 2011)
Focus on IE
< ’s acquisition of > pattern for property subsidiary
“Google’s acquisition of Youtube comes as online
video is really starting to hit its stride.”
relation expressed by string between entities

Metthods (AIFB)
41
Related Work (4/4)
Distant supervision
(Craven and Kumlien, 1999), (Bunescu and Mooney,
2007), (Carlson et al., 2009), (Mintz et al., 2009), (Welty
et al., 2010), (Hoffmann et al., 2011), (Surdeanu et al.,
2012)
Simultaneus multi-relation learning
(Carlson et al., 2009)

Metthods (AIFB)
42
SUMMARY

Metthods (AIFB)
43
Summary
Introduced RDF verbalization templates
Introduced template extraction approach
Distant-supervised
Language independent
Simultaneous multi-relation learning
Frequent maximal subgraph pattern mining
Evaluation
Large parallel text-data corpus for en and de
Good syntactical correctness & understandability
Accuracy needs to be improved in future work

Metthods (AIFB)
44
Thank you for your attention!
The authors acknowledge the support of the European Commission's Seventh Framework Programme
FP7-ICT-2011-7 (XLike, Grant 288342).
http://km.aifb.kit.edu/sites/bridge-patterns/INLG2014/

Metthods (AIFB)
45
References (1/2)
Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In
Annual meeting-association for Computational Linguistics, volume 45, pages 576–583.
Andrew Carlson, Justin Betteridge, Estevam R Hruschka Jr, and Tom M Mitchell. 2009. Coupling semi-supervised learning
of categories and relations. In Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for
Natural Language Processing, pages 1–9. Association for Computational Linguistics.
Mark Craven and Johan Kumlien. 1999. Constructing biological knowledge bases by extracting information from text
sources. In Thomas Lengauer, Reinhard Schneider, Peer Bork, Douglas L. Brutlag, Janice I. Glasgow, Hans-Werner
Mewes, and Ralf Zimmer, editors, ISMB, pages 77–86. AAAI.
Daniel Duma and Ewan Klein, 2013. Generating Natural Language from Linked Data: Unsupervised template extraction,
pages 83–94. Association for Computational Linguistics, Potsdam, Germany.
Daniel Gerber and A-C Ngonga Ngomo. 2011. Bootstrapping the linked data web. In 1st Workshop on Web Scale
Knowledge Extraction @ International Semantic Web Conference, volume 2011.
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S Weld. 2011. Knowledge-based weak
supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies - Volume 1, pages 541–550. Association for
Computational Linguistics.

Metthods (AIFB)
46
References (2/2)
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled
data. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint
Conference on Natural Language Processing of the AFNLP: Volume 2 - ACL-IJCNLP 09, pages 1003–1011.
Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D Manning. 2012. Multi-instance multi-label learning
for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning, pages 455–465. Association for Computational Linguistics.
Chris Welty, James Fan, David Gondek, and Andrew Schlaikjer. 2010. Large scale relation detection. In Proceedings of the
NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 24–
33. Association for Computational Linguistics.

A language-independent method for the extraction of RDF verbalization templateslization - ppt spli-t

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A language-independent method for the extraction of RDF verbalization templateslization - ppt spli-t

Similar to A language-independent method for the extraction of RDF verbalization templateslization - ppt spli-t (20)

Recently uploaded

Recently uploaded (20)

A language-independent method for the extraction of RDF verbalization templateslization - ppt spli-t

Editor's Notes