Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Triplex Approach for
Recognizing Semantic Relations
from Noun Phrases,
Appositions, and Adjectives
Iman Mirrezaei, Bru...
Motivation
 How to extract useful knowledge from
textual resources?
 How to identify relations between
entities?
2
Micro...
Triples
 Each triple represents an atomic fact by stating
a subject, a predicate (property) and an object
(value)
◦ e.g.,...
Information extractors
 Verb-mediated triple extractors
◦ TextRunner [Banko et al. 2007], WOE [Wu and Weld
2010], ReVerb ...
Information extractors
Deep
syntacti
c
features
Shallow
syntactic
features
Lexical
constraints
Type
constrains(e.g.,
perso...
Noun-mediated triples
 Noun-mediated triples can be expressed
through noun phrase with adjectives,
compound nouns and app...
Architecture
Sentences
Template
Extractio
n
Stanford NLP
Toolkit
WordNet
Synonym sets
of Wikipedia
pages
Infoboxe
s
Senten...
The bootstrapping process
 A sentence of a wiki page is extracted if it
contains an infobox value (object) and a synset
m...
Example
 Microsoft Corporation is an American
multinational software corporation
headquartered in Redmond,
Washington tha...
Microsoft is an American corporation headquartered in Redmond , Washington
NNP VBZ DT JJ NN VBN IN NNP , NNP
ORG O O MISC ...
Templates
 Templates express how a class of triples is
expressed in a sentence.
◦ Deep syntactic features: dependencies
◦...
Triplex
 Confidence score for triples
◦ A logistic regression classifier
◦ Features: frequency of the extraction
template...
Evaluation
 Automatic evaluation according to the
procedure suggested by Bronzi et al.[2012]
◦ 1000 random sentences from...
The gold standard
 A fact is a triple <subject, property, object>
 All possible entities are recognized by NER
types and...
Evaluation results
Automatic evaluation Manual evaluation
Precision Recall
F-
measure
Precision Recall
F-
measure
REVERB
0...
Error analysis
Missed extractions
10% No semantic types
12% Dependency parser problems
7% Coreferencing errors
6% Over-gen...
Correctly extracted triples
Distribution Triple category
Noun-
mediated
12%
Conjonctions, adjectives
and noun phrases
9%
A...
Conclusion
 Triplex generates noun-mediated
triples from compound nouns,
adjective, and appositions
 Triplex complements...
Future works
 Improve results for triples involving
numerical values with different units
(i.e., square meter, meter)
 E...
References
 M. Banko, M.J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni:
Open Information Extraction for the Web. In...
Upcoming SlideShare
Loading in …5
×

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives

394 views

Published on

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives

Published in: Science
  • Be the first to comment

  • Be the first to like this

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives

  1. 1. The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives Iman Mirrezaei, Bruno Martins, and Isabel F. Cruz ADVIS Lab, Department of Computer Science, University of Illinois at Chicago, USA Instituto Superior Tecnico, Universidade de Lisboa, Portugal 1 1 2 2 1
  2. 2. Motivation  How to extract useful knowledge from textual resources?  How to identify relations between entities? 2 Microsoft is an American corporation headquartered in Redmond , Washington Michelle Obama (born January 17, 1964), an American lawyer and writer, is the wife of the ...
  3. 3. Triples  Each triple represents an atomic fact by stating a subject, a predicate (property) and an object (value) ◦ e.g., “The sky has the color blue.” <the sky; has; the color blue>  Triples can be expressed by verbs, or by particular noun phrases in textual resources ◦ Verb-mediated formats ◦ Noun-mediated formats  An information extractor converts an input text to a set of triples 3
  4. 4. Information extractors  Verb-mediated triple extractors ◦ TextRunner [Banko et al. 2007], WOE [Wu and Weld 2010], ReVerb [Fader et al. 2011], and OLLIE [Mausam et al. 2012] ◦ e.g., “Obama will be elected President of the United States” <Obama; will be elected; President of the United States>  Noun-mediated triple extractors ◦ OLLIE: the first noun-mediated triple extractor ◦ OLLIE has patterns to extract noun-mediated triples if they can also be expressed by a verb- mediated format ◦ e.g., “Microsoft co-founder Bill Gates spoke at the conference” <Bill Gates; be co-founder of; Microsoft> 4
  5. 5. Information extractors Deep syntacti c features Shallow syntactic features Lexical constraints Type constrains(e.g., person, location, …) TextRunne r WOE-pos WOE- parse ReVerb OLLIE Triplex (suggeste d) 5
  6. 6. Noun-mediated triples  Noun-mediated triples can be expressed through noun phrase with adjectives, compound nouns and appositions  How to extract noun-mediated triples that are not expressed via verb-mediated formats?  How to extract templates automatically from text to generate noun-mediated triples? 6
  7. 7. Architecture Sentences Template Extractio n Stanford NLP Toolkit WordNet Synonym sets of Wikipedia pages Infoboxe s Sentence Extraction Text 7
  8. 8. The bootstrapping process  A sentence of a wiki page is extracted if it contains an infobox value (object) and a synset member (subject) ◦ The sentence is checked if there is a dependency path between object and subject (noun, adjective, or apposition dependencies) ◦ Tokens in the dependency paths between subject and object are annotated with POS tags, lexical constraints, WordNet synsets and named entity tags  Annotated paths are seen as extraction templates  Constraint on the length of the dependency path 8
  9. 9. Example  Microsoft Corporation is an American multinational software corporation headquartered in Redmond, Washington that develops…. ◦ vmod(corporation-8, headquartered-9) prep(headquartered-9, in-10) nn(Washington-13, Redmond-11) 9
  10. 10. Microsoft is an American corporation headquartered in Redmond , Washington NNP VBZ DT JJ NN VBN IN NNP , NNP ORG O O MISC O O O LOC O LOC O O O O ORG O O O O O Infobox name: Headquarters Infobox value: Redmond, Washington Range of headquarters : Location Synset member: Corporation Synset member type: Organization Lexical constraint: Headquarter in Microsoft corporation Coreference nn vmod prep-in O O O O Subject O O Object O: No label PER: person NUM: number ORG: organization Example 10 POS tags Named Entities WordNet synsets Occurrences of subject and object
  11. 11. Templates  Templates express how a class of triples is expressed in a sentence. ◦ Deep syntactic features: dependencies ◦ Shallow syntactic features: POS tags, noun phrases ◦ Lexical features ◦ Named entity types: WordNet synsets ◦ Property ranges (Person, Organization, Location, or unknown) 11
  12. 12. Triplex  Confidence score for triples ◦ A logistic regression classifier ◦ Features: frequency of the extraction templates, existence of lexical words, range of properties, semantic object type  Template matching ◦ Recognizing candidate subjects by NER types and WordNet synsets ◦ The dependency paths between subject and all potential objects are annotated ◦ Matching with templates 12
  13. 13. Evaluation  Automatic evaluation according to the procedure suggested by Bronzi et al.[2012] ◦ 1000 random sentences from Wikipedia ◦ Create a gold standard by using PMI, DBPedia, and Freebase  Manual evaluation ◦ 50 random sentences from Wikipedia ◦ The agreement between the automatic and manual evaluation is about .71 13
  14. 14. The gold standard  A fact is a triple <subject, property, object>  All possible entities are recognized by NER types and WordNet synsets  All verbs(predicates) are detected by the Stanford CoreNLP and predicates are expanded by adding DBPedia and Freebase properties  All extracted facts of sentences are verified by ◦ DBPedia ◦ Freebase 14
  15. 15. Evaluation results Automatic evaluation Manual evaluation Precision Recall F- measure Precision Recall F- measure REVERB 0.61 0.15 0.24 0.55 0.11 0.18 OLLIE 0.64 0.30 0.40 0.65 0.32 0.42 OLLIE* 0.62 0.1 0.17 0.63 0.11 0.18 Triplex 0.55 0.17 0.25 0.62 0.22 0.32 Triplex + OLLIE 0.57 0.40 0.47 0.63 0.44 0.51 Triplex + REVERB 0.58 0.32 0.41 0.55 0.35 0.42 OLLIE* only generates triples according to noun-mediated formats 15
  16. 16. Error analysis Missed extractions 10% No semantic types 12% Dependency parser problems 7% Coreferencing errors 6% Over-generalized templates 65% Verb-mediated triples (outside the of scope for Triplex) 16
  17. 17. Correctly extracted triples Distribution Triple category Noun- mediated 12% Conjonctions, adjectives and noun phrases 9% Apposition and parenthetical phrases 6% Titles or professions 8% Templates with lexicon Verb- mediated 65% Verb-mediated triples 17
  18. 18. Conclusion  Triplex generates noun-mediated triples from compound nouns, adjective, and appositions  Triplex complements the output of verb-mediated triple extractors  IE systems like Triplex can assist authors to annotate Wikipedia pages (recognize missing infobox values) 18
  19. 19. Future works  Improve results for triples involving numerical values with different units (i.e., square meter, meter)  Enrich the bootstrapping process by using a probabilistic knowledgebase(e.g., Probase [2012]) 19
  20. 20. References  M. Banko, M.J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni: Open Information Extraction for the Web. In: International Joint Conferences on Artificial Intelligence (IJCAI). pp. 2670–2676 (2007)  A. Fader, S. Soderland, O. Etzioni: Identifying Relations for Open Information Extraction. In: Conference on Empirical Methods in Natural Language Processing. pp. 1535–1545 (2011)  Mausam, M. Schmitz, R. Bart, S. Soderland, O. Etzioni: Open Language Learning for Information Extraction. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. pp. 523–534 (2012)  F. Wu, and D.S. Weld: Open Information Extraction Using Wikipedia. In: Annual Meeting of the Association for Computational Linguistics. pp. 118–127 (2010)  M. Bronzi, Z. Guo, F. Mesquita, D. Barbosa, P. Merialdo : Automatic Evaluation of Relation Extraction Systems on Large-scale. In: Joint Workshop on Automatic Knowledge Base Construction and Web- scale Knowledge Extraction. pp. 19–24 (2012)  W. Wu, H. Li, H. Wang, K.Q. Zhu: Probase: A Probabilistic Taxonomy for Text Understanding. In: ACM SIGMOD International Conference on Management of Data. pp. 481–492 (2012) 20

×