Active Learning for Multi-relational Data Construction

Active Learning for
Multi-relational Data Construction
Hiroshi Kajino1, Akihiro Kishimoto2, Adi Botea2
Elizabeth Daly2, Spyros Kotoulas2
1: The University of Tokyo, Japan, 2: IBM Research - Ireland
1

/28
■ Research focus: Manual RDF data construction
□ Some data are difficult to extract automatically from docs
Q: How can we efficiently construct the dataset by hands?
■ Our solution: Active learning + multi-relational learning
□ Reduce the number of queries as much as possible
2
We develop a method to support hand RDF data annotation
Multi-relational
model Annotators1. Query labels of
informative triples
2. Return labels
3. Update the dataset & retrain the model

/28
■ Outline
□ Problem settings:
• Multi-relational (RDF) data and their applications
• Two formulations:
– Dataset construction problem
– Predictive model construction problem
□ Our solution (AMDC):
• Active learning
• Multi-relational learning
□ Experiments
3

/28
■ Outline
• Active learning
□ Experiments
4

/28
■ Multi-relational dataset (RDF format)
□ Triple: t = (i, j, k)
• Entity: i, j ∈ E
• Relation: k ∈ R
□ Label:
• t is positive Entity i is in relation k with entity j
• t is negative Entity i is not in relation k with entity j
□ Multi-relational dataset: (Δp, Δn)
Δp = {t ∈ Δ | t is positive}, Δn = {t ∈ Δ | t is negative}
• Assume: |Δp| ≪ |Δ|, some triples remain unlabeled
5
Multi-relational dataset consists of binary-labeled triples
Dog Animal
Human
is a part ofis the same as
is a part of
Set of all the triples

/28
■ Motivation of manual construction
□ Knowledge base: Human knowledge encoded in RDF
Point: Commonsense knowledge rarely appears in docs
→ Difficult to extract it automatically from documents
□ Biological dataset:
→ Some unknown triples require experiments for labeling
6
Some RDF datasets require hand annotation by nature
Dataset Positive triple example
WordNet [Miller, 95] (dog, canine, synset), (dog, poodle, hypernym)
ConceptNet [Liu+,04]
(saxophone, jazz, UsedFor),
(learn, knowledge, MotivatedByGoal)
interact participate
Protein DNA Cell cycle

/28
■ Two problem formulations
□ Inputs:
• Set of entities E, relations R, annotator O: Δ→{+1,-1}
□ Problem 1: Dataset construction problem
• Output: Positive triples Δp
• Note: Positive triples are usually quite few
□ Problem 2: Predictive model construction problem
• Output: Multi-relational model M: Δ→R
• Note: The model can predict labels of unlabeled triples
※ More direct formulation than Prob. 1 if the model is the goal
7
Two problem settings reflect different usages of a dataset
・ No error
・ B times access
Degree of
“positiveness”

/28
■ Outline
• Active learning
□ Experiments
8

/28
■ Active Multi-relational Data Construction
□ Overview:
9
Our solution, AMDC, repeats learning and querying B times
Multi-relational
informative triples
2. Return labels
Training dataset (Δp, Δn)
Train the model using the current training dataset

/28
□ Overview:
10
Multi-relational
informative triples
2. Return labels
AMDC is able to compute predictive score st (t ∈ Δu):
Larger/smaller st model believes t is pos/neg

/28
□ Overview:
11
Multi-relational
informative triples
2. Return labels
Compute query score qt (t ∈ Δu) using st
Smaller qt t is informative for dataset construction

/28
□ Overview:
12
Multi-relational
informative triples
2. Return labels

/28
□ Details:
• Query scores qt
• Multi-relational model, predictive score st
13
We explain the details of AMDC in 2 parts
Multi-relational
informative triples
2. Return labels

/28
□ Details:
• Query scores qt
14
Multi-relational
informative triples
2. Return labels

/28
■ AMDC (1/2): Query scores
□ Given: predictive score st, threshold 0
s.t. st > 0 (< 0) model believes t is positive (negative)
□ Query score qt (t ∈ Δ):
Query the label on triples {t} w/ smallest qt
• Positiveness score (for Problem 1): qt := - st
Choose triples the model believes to be positive
• Uncertainty score (for Problem 2): qt = |st |
Choose triples that the model is uncertain
※ AMDC handles two problems just by switching the query score
15
We employ two different query scores for the two problems
pos
neg
st
0

/28
□ Details:
• Query scores qt
16
Multi-relational
informative triples
2. Return labels

/28
■ AMDC (2/2): Multi-relational model
□ RESCAL [Nickel+,11]:
• Model:
– ai ∈ RD : Latent vector of entity i
– Rk ∈ RD D : Latent matrix of relation k
• Predictive score: st = ai
T Rk aj
Large/small st t is likely to be positive/negative
□ Additional constraints: |ai| = 1, Rk = rotation matrix
• Reduce the degree of freedom
• Stabilize learning in case of small labels (at the beginning)
(→ experiments)
17
We add two constraints to RESCAL to avoid overfitting
New

/28
■ AMDC (2/2): Optimization problem for learning
18
Pros Cons
pos AUC-loss
s(pos) > s(non-pos)
- Robust to pos/neg ratio
- Unlabeled triples are used
- Neg is not explicitly used
- No threshold for pos/neg
neg AUC-loss
s(non-neg) > s(neg)
Neg triples are explicitly used
(→ experiments)
No threshold between pos/neg
Classification
error
s(pos) > 0
s(neg) < 0
- Threshold between pos/neg
→ Able to compute
the uncertainty score
- Non-robust to pos/neg ratio
- Difficult to use unlabeled triples
Two objective functions are added to overcome the cons
min pos AUC-loss + neg AUC-loss + classification loss
New
New
pos
st
unlabeled
neg

/28
19
Pros Cons
pos AUC-loss
s(pos) > s(non-pos)
neg AUC-loss
s(non-neg) > s(neg)
(→ experiments)
Classification
error
s(pos) > 0
s(neg) < 0
→ Able to compute
New
New
neg
st
pos
unlabeled
pos
st
unlabeled
neg
+

/28
20
Pros Cons
pos AUC-loss
s(pos) > s(non-pos)
neg AUC-loss
s(non-neg) > s(neg)
(→ experiments)
Classification
error
s(pos) > 0
s(neg) < 0
→ Able to compute
New
New
neg
st
pos
unlabeled
pos
st
unlabeled
neg
pos
neg
st
unlabeled = +

/28
21
Pros Cons
pos AUC-loss
s(pos) > s(non-pos)
neg AUC-loss
s(non-neg) > s(neg)
(→ experiments)
Classification
loss
s(pos) > 0
s(neg) < 0
→ Able to compute
New
New
pos
neg
st
unlabeled0

/28
■ AMDC (2/2): Optimization problem
□ Algorithm: Stochastic gradient descent
□ Parameters:
□ Hyperparameters: γ, γ’, Cn, Ce, D
At each iteration, we choose the best model by using a val set
22
Margin-based loss functions are optimized using SGD
s(pos) > s(non-pos)
s(non-neg) > s(neg)
s(pos) > 0 s(neg) < 0
,

/28
■ Outline
• Active learning
□ Experiments
23

/28
■ Experiments
□ Purpose: Evaluate 3 contributions of AMDC in two problems
• Query scores (vs. AMDC + random query)
• Constraints on RESCAL (vs. AMDC - constraints)
• neg AUC-loss (vs. AMDC - neg-AUC)
□ Datasets:
• Annotators are simulated
24
We evaluate 3 modifications using partial AMDCs
#(Entity) #(Relation) #(Pos) #(Neg)
Kinships [Denham, 73] 104 26 10,790 270,426
Nations [Rummel, 50-65] 125 57 2,565 8,626
UMLS [McCray,03] 135 49 6,752 886,273

/28
■ Experiments (1/2): Dataset construction problem
Score: %(pos triples collected by AMDC)
□ AMCD shows 2.4 – 19 times improvements over Random
□ Negative triples are helpful when they are abundant (K, U)
□ Effects of the constraints are incremental
25
AMDC has collected 2.4 – 19 times as many positive triples as baselines
10 trials, (Q, q) = (105, 103) ((2 103,102) for Nations)
Nations
0 200 400 600 800 1000 1200 1400 1600
#(Queries)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Completionrate
AMDC
AMDC rand
AMDC pos only
AMDC no const
UMLS
0 2000 4000 6000 8000 10000 12000 14000 16000
#(Queries)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Completionrate
AMDC
AMDC rand
AMDC pos only
AMDC no const
Kinships
0 2000 4000 6000 8000 10000 12000 14000 16000
#(Queries)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Completionrate
AMDC
AMDC rand
AMDC pos only
AMDC no const
No neg-AUC
Random
Full AMDC
No constraints

/28
■ Experiments (2/2): Predictive model construction problem
Score: ROC-AUC
□ AMDC often achieves better AUC than Random (K, U)
□ Negative triples are also helpful to improve ROC-AUC
□ Constraints work to prevent overfitting
26
AMDC has achieved the best predictive score
Kinships
0 2000 4000 6000 8000 10000 12000 14000 16000
#(Queries)
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ROC-AUC
AMDC
AMDC rand
AMDC pos only
AMDC no const
Nations
0 200 400 600 800 1000 1200 1400 1600
#(Queries)
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ROC-AUC
AMDC
AMDC rand
AMDC pos only
AMDC no const
UMLS
0 2000 4000 6000 8000 10000 12000 14000 16000
#(Queries)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ROC-AUC
AMDC
AMDC rand
AMDC pos only
AMDC no const
10 trials, (Q, q) = (105, 103) ((2 103,102) for Nations)
No neg-AUC
Random
Full AMDC
No constraints

/28
■ Conclusions
□ Manual RDF dataset construction is still demanding
• Some datasets require hand annotation by its nature
• Crowdsourcing provides an easy way of recruiting annotators
It's time to consider the manual construction problem!
□ AMDC = active learning + multi-relational learning
• RESCAL-based multi-relational learning
□ 3 key contributions lead to better performance
• Active learning significantly reduces the cost
• Constraints prevents overfitting
• Negative AUC-loss works better in case of skewed datasets
27
We consider manual annotation problems of the RDF data

Active Learning for Multi-relational Data Construction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Active Learning for Multi-relational Data Construction

Similar to Active Learning for Multi-relational Data Construction (20)

More from Hiroshi Kajino

More from Hiroshi Kajino (10)

Recently uploaded

Recently uploaded (20)

Active Learning for Multi-relational Data Construction

Editor's Notes