Relation-wise Automatic Domain-Range Information Management for Knowledge Entries

Relation-wise Automatic Domain-
Range Information Management for
Knowledge Entries
Md-Mizanur Rahoman & Ryutaro Ichise
The Graduate University for Advanced Studies, Tokyo, Japan
National Institute of Informatics, Tokyo, Japan
Begum Rokeya University, Rangpur, Bangladesh

Outline
• Background
• Problem & Possible Solution
• Proposed Framework
• Experiment
• Conclusion
30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 2

Background
• knowledge-base (KB) construction and management gained interest
• relations play great role in KB
• construction – generation of knowledge entries
<Subject, relation, Object>
• e.g., <Obama, born_in, Hawaii>
• management – validation of knowledge entries
• e.g., domain(born_in) = Person, range(born_in) = Place
• not all knowledge-base maintain domain-range validation for
relation, e.g., Freebase

Problem
• existence of wrong entries – e.g., in current
• costly maintenance - domain-range selection is not automatic
• manual checking time consuming
• require domain level expertise
Subject Relation Object
Paprika type Book
Paprika author Yasutaka Tsutsui
Freedom in Exile type Book
Freedom in Exile author 14

Possible Solution
• Intuition
• Subjects of a relation should hold some similarity
• extract features for Subject entities and generate learning
model e.g.,
• Subject(born_in) will only comply if it is Person i.e., domain
• Objects of a relation should hold some similarity
• extract features for Objects entities and generate learning
model e.g.,
• Object(born_in) will only comply if it is Place i.e., range

Proposed Framework
• required resource
• language specific relation - e.g., born_in, spouse, author etc.
• language specific training example - e.g., entries
• language specific large text corpus - e.g.,
Obama born_in Hawaii
Trump born_in New York
Clinton born_in Chicago
… … …

Proposed Framework
• process
• Word Vectorizer
• generate features for words
from a large text corpus
• Model Generator
• generate supervised machine
learning models for the
extracted features

Word Vectorizer
• take large text corpus e.g.,
• use Word2Vec* implementation for word embedding
• generate feature vectors for text vocabulary
• maintain linguistic context for the corpus
• put similar words into similar kind of vectors
* https://code.google.com/p/word2vec/

Model Generator (1/4)
• For each relation
• collect positive and negative training words
• collect feature vectors for training words
• generate two supervised machine learning models (domain &
range model) that classify
• a word element should belong to domain or not
• a word element should belong to range or not

• positive features
• collected from existing knowledge entries
• divided into Subject element feature vectors and Object element
feature vectors
Obama born_in Hawaii
Trump born_in New York
Clinton born_in Chicago
… … …

• negative features
• collected for random vocabularies of text corpus
• excluded for positive word elements that already considered
• maintained for same number of negative and positive training

• models
• domain model
• generated for Subject element feature vectors and negative
word feature vectors
• used decision tree-based learning model
• range model
• generated for Object element feature vectors and negative
word feature vectors
• used decision tree-based learning model

Experiment
• resource
• relations – 32 frequent English relations (among first 100)
• Cat-1 – range values are distributed over domain e.g., candidate
• Cat-2 – range values are concentrated over domain e.g., genre
• training example – entries for the relations
• Text corpus – English
• evaluation metrics - accuracy

Result
• purpose – show how accurately it can detect correct (pos) and incorrect (neg)
entries, and mix (i.e., pos + neg)
• finding – same type of word belong to same kind of feature vectors, model
generalize the words

Conclusion
• Observation
• a relation should hold same type of elements as Subject and same
type of elements as Object
• generalization of Subject and Object can automatically generate
domain and range for a relation - experiment result support this
assumption
• Future Work
• look for more sophisticated learning model other than decision
tree
• want to investigate different word embedding other than the
default in word2vec

Relation-wise Automatic Domain-Range Information Management for Knowledge Entries

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Relation-wise Automatic Domain-Range Information Management for Knowledge Entries

Similar to Relation-wise Automatic Domain-Range Information Management for Knowledge Entries (20)

More from National Inistitute of Informatics (NII), Tokyo, Japann

More from National Inistitute of Informatics (NII), Tokyo, Japann (6)

Recently uploaded

Recently uploaded (20)

Relation-wise Automatic Domain-Range Information Management for Knowledge Entries