SlideShare a Scribd company logo
Active Learning for
Multi-relational Data Construction
Hiroshi Kajino1, Akihiro Kishimoto2, Adi Botea2
Elizabeth Daly2, Spyros Kotoulas2
1: The University of Tokyo, Japan, 2: IBM Research - Ireland
1
/28
■ Research focus: Manual RDF data construction
□ Some data are difficult to extract automatically from docs
Q: How can we efficiently construct the dataset by hands?
■ Our solution: Active learning + multi-relational learning
□ Reduce the number of queries as much as possible
2
We develop a method to support hand RDF data annotation
Multi-relational
model Annotators1. Query labels of
informative triples
2. Return labels
3. Update the dataset & retrain the model
/28
■ Outline
□ Problem settings:
• Multi-relational (RDF) data and their applications
• Two formulations:
– Dataset construction problem
– Predictive model construction problem
□ Our solution (AMDC):
• Active learning
• Multi-relational learning
□ Experiments
3
/28
■ Outline
□ Problem settings:
• Multi-relational (RDF) data and their applications
• Two formulations:
– Dataset construction problem
– Predictive model construction problem
□ Our solution (AMDC):
• Active learning
• Multi-relational learning
□ Experiments
4
/28
■ Multi-relational dataset (RDF format)
□ Triple: t = (i, j, k)
• Entity: i, j ∈ E
• Relation: k ∈ R
□ Label:
• t is positive Entity i is in relation k with entity j
• t is negative Entity i is not in relation k with entity j
□ Multi-relational dataset: (Δp, Δn)
Δp = {t ∈ Δ | t is positive}, Δn = {t ∈ Δ | t is negative}
• Assume: |Δp| ≪ |Δ|, some triples remain unlabeled
5
Multi-relational dataset consists of binary-labeled triples
Dog Animal
Human
is a part ofis the same as
is a part of
Set of all the triples
/28
■ Motivation of manual construction
□ Knowledge base: Human knowledge encoded in RDF
Point: Commonsense knowledge rarely appears in docs
→ Difficult to extract it automatically from documents
□ Biological dataset:
→ Some unknown triples require experiments for labeling
6
Some RDF datasets require hand annotation by nature
Dataset Positive triple example
WordNet [Miller, 95] (dog, canine, synset), (dog, poodle, hypernym)
ConceptNet [Liu+,04]
(saxophone, jazz, UsedFor),
(learn, knowledge, MotivatedByGoal)
interact participate
Protein DNA Cell cycle
/28
■ Two problem formulations
□ Inputs:
• Set of entities E, relations R, annotator O: Δ→{+1,-1}
□ Problem 1: Dataset construction problem
• Output: Positive triples Δp
• Note: Positive triples are usually quite few
□ Problem 2: Predictive model construction problem
• Output: Multi-relational model M: Δ→R
• Note: The model can predict labels of unlabeled triples
※ More direct formulation than Prob. 1 if the model is the goal
7
Two problem settings reflect different usages of a dataset
・ No error
・ B times access
Degree of
“positiveness”
/28
■ Outline
□ Problem settings:
• Multi-relational (RDF) data and their applications
• Two formulations:
– Dataset construction problem
– Predictive model construction problem
□ Our solution (AMDC):
• Active learning
• Multi-relational learning
□ Experiments
8
/28
■ Active Multi-relational Data Construction
□ Overview:
9
Our solution, AMDC, repeats learning and querying B times
Multi-relational
model Annotators1. Query labels of
informative triples
2. Return labels
3. Update the dataset & retrain the model
Training dataset (Δp, Δn)
Train the model using the current training dataset
/28
■ Active Multi-relational Data Construction
□ Overview:
10
Our solution, AMDC, repeats learning and querying B times
Multi-relational
model Annotators1. Query labels of
informative triples
2. Return labels
3. Update the dataset & retrain the model
Training dataset (Δp, Δn)
AMDC is able to compute predictive score st (t ∈ Δu):
Larger/smaller st model believes t is pos/neg
/28
■ Active Multi-relational Data Construction
□ Overview:
11
Our solution, AMDC, repeats learning and querying B times
Multi-relational
model Annotators1. Query labels of
informative triples
2. Return labels
3. Update the dataset & retrain the model
Training dataset (Δp, Δn)
Compute query score qt (t ∈ Δu) using st
Smaller qt t is informative for dataset construction
/28
■ Active Multi-relational Data Construction
□ Overview:
12
Our solution, AMDC, repeats learning and querying B times
Multi-relational
model Annotators1. Query labels of
informative triples
2. Return labels
3. Update the dataset & retrain the model
Training dataset (Δp, Δn)
/28
■ Active Multi-relational Data Construction
□ Details:
• Query scores qt
• Multi-relational model, predictive score st
13
We explain the details of AMDC in 2 parts
Multi-relational
model Annotators1. Query labels of
informative triples
2. Return labels
3. Update the dataset & retrain the model
/28
■ Active Multi-relational Data Construction
□ Details:
• Query scores qt
• Multi-relational model, predictive score st
14
We explain the details of AMDC in 2 parts
Multi-relational
model Annotators1. Query labels of
informative triples
2. Return labels
3. Update the dataset & retrain the model
/28
■ AMDC (1/2): Query scores
□ Given: predictive score st, threshold 0
s.t. st > 0 (< 0) model believes t is positive (negative)
□ Query score qt (t ∈ Δ):
Query the label on triples {t} w/ smallest qt
• Positiveness score (for Problem 1): qt := - st
Choose triples the model believes to be positive
• Uncertainty score (for Problem 2): qt = |st |
Choose triples that the model is uncertain
※ AMDC handles two problems just by switching the query score
15
We employ two different query scores for the two problems
pos
neg
st
0
/28
■ Active Multi-relational Data Construction
□ Details:
• Query scores qt
• Multi-relational model, predictive score st
16
We explain the details of AMDC in 2 parts
Multi-relational
model Annotators1. Query labels of
informative triples
2. Return labels
3. Update the dataset & retrain the model
/28
■ AMDC (2/2): Multi-relational model
□ RESCAL [Nickel+,11]:
• Model:
– ai ∈ RD : Latent vector of entity i
– Rk ∈ RD D : Latent matrix of relation k
• Predictive score: st = ai
T Rk aj
Large/small st t is likely to be positive/negative
□ Additional constraints: |ai| = 1, Rk = rotation matrix
• Reduce the degree of freedom
• Stabilize learning in case of small labels (at the beginning)
(→ experiments)
17
We add two constraints to RESCAL to avoid overfitting
New
/28
■ AMDC (2/2): Optimization problem for learning
18
Pros Cons
pos AUC-loss
s(pos) > s(non-pos)
- Robust to pos/neg ratio
- Unlabeled triples are used
- Neg is not explicitly used
- No threshold for pos/neg
neg AUC-loss
s(non-neg) > s(neg)
Neg triples are explicitly used
(→ experiments)
No threshold between pos/neg
Classification
error
s(pos) > 0
s(neg) < 0
- Threshold between pos/neg
→ Able to compute
the uncertainty score
- Non-robust to pos/neg ratio
- Difficult to use unlabeled triples
Two objective functions are added to overcome the cons
min pos AUC-loss + neg AUC-loss + classification loss
New
New
pos
st
unlabeled
neg
/28
■ AMDC (2/2): Optimization problem for learning
19
Pros Cons
pos AUC-loss
s(pos) > s(non-pos)
- Robust to pos/neg ratio
- Unlabeled triples are used
- Neg is not explicitly used
- No threshold for pos/neg
neg AUC-loss
s(non-neg) > s(neg)
Neg triples are explicitly used
(→ experiments)
No threshold between pos/neg
Classification
error
s(pos) > 0
s(neg) < 0
- Threshold between pos/neg
→ Able to compute
the uncertainty score
- Non-robust to pos/neg ratio
- Difficult to use unlabeled triples
Two objective functions are added to overcome the cons
min pos AUC-loss + neg AUC-loss + classification loss
New
New
neg
st
pos
unlabeled
pos
st
unlabeled
neg
+
/28
■ AMDC (2/2): Optimization problem for learning
20
Pros Cons
pos AUC-loss
s(pos) > s(non-pos)
- Robust to pos/neg ratio
- Unlabeled triples are used
- Neg is not explicitly used
- No threshold for pos/neg
neg AUC-loss
s(non-neg) > s(neg)
Neg triples are explicitly used
(→ experiments)
No threshold between pos/neg
Classification
error
s(pos) > 0
s(neg) < 0
- Threshold between pos/neg
→ Able to compute
the uncertainty score
- Non-robust to pos/neg ratio
- Difficult to use unlabeled triples
Two objective functions are added to overcome the cons
min pos AUC-loss + neg AUC-loss + classification loss
New
New
neg
st
pos
unlabeled
pos
st
unlabeled
neg
pos
neg
st
unlabeled = +
/28
■ AMDC (2/2): Optimization problem for learning
21
Pros Cons
pos AUC-loss
s(pos) > s(non-pos)
- Robust to pos/neg ratio
- Unlabeled triples are used
- Neg is not explicitly used
- No threshold for pos/neg
neg AUC-loss
s(non-neg) > s(neg)
Neg triples are explicitly used
(→ experiments)
No threshold between pos/neg
Classification
loss
s(pos) > 0
s(neg) < 0
- Threshold between pos/neg
→ Able to compute
the uncertainty score
- Non-robust to pos/neg ratio
- Difficult to use unlabeled triples
Two objective functions are added to overcome the cons
min pos AUC-loss + neg AUC-loss + classification loss
New
New
pos
neg
st
unlabeled0
/28
■ AMDC (2/2): Optimization problem
□ Algorithm: Stochastic gradient descent
□ Parameters:
□ Hyperparameters: γ, γ’, Cn, Ce, D
At each iteration, we choose the best model by using a val set
22
Margin-based loss functions are optimized using SGD
s(pos) > s(non-pos)
s(non-neg) > s(neg)
s(pos) > 0 s(neg) < 0
,
/28
■ Outline
□ Problem settings:
• Multi-relational (RDF) data and their applications
• Two formulations:
– Dataset construction problem
– Predictive model construction problem
□ Our solution (AMDC):
• Active learning
• Multi-relational learning
□ Experiments
23
/28
■ Experiments
□ Purpose: Evaluate 3 contributions of AMDC in two problems
• Query scores (vs. AMDC + random query)
• Constraints on RESCAL (vs. AMDC - constraints)
• neg AUC-loss (vs. AMDC - neg-AUC)
□ Datasets:
• Annotators are simulated
24
We evaluate 3 modifications using partial AMDCs
#(Entity) #(Relation) #(Pos) #(Neg)
Kinships [Denham, 73] 104 26 10,790 270,426
Nations [Rummel, 50-65] 125 57 2,565 8,626
UMLS [McCray,03] 135 49 6,752 886,273
/28
■ Experiments (1/2): Dataset construction problem
Score: %(pos triples collected by AMDC)
□ AMCD shows 2.4 – 19 times improvements over Random
□ Negative triples are helpful when they are abundant (K, U)
□ Effects of the constraints are incremental
25
AMDC has collected 2.4 – 19 times as many positive triples as baselines
10 trials, (Q, q) = (105, 103) ((2 103,102) for Nations)
Nations
0 200 400 600 800 1000 1200 1400 1600
#(Queries)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Completionrate
AMDC
AMDC rand
AMDC pos only
AMDC no const
UMLS
0 2000 4000 6000 8000 10000 12000 14000 16000
#(Queries)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Completionrate
AMDC
AMDC rand
AMDC pos only
AMDC no const
Kinships
0 2000 4000 6000 8000 10000 12000 14000 16000
#(Queries)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Completionrate
AMDC
AMDC rand
AMDC pos only
AMDC no const
No neg-AUC
Random
Full AMDC
No constraints
/28
■ Experiments (2/2): Predictive model construction problem
Score: ROC-AUC
□ AMDC often achieves better AUC than Random (K, U)
□ Negative triples are also helpful to improve ROC-AUC
□ Constraints work to prevent overfitting
26
AMDC has achieved the best predictive score
Kinships
0 2000 4000 6000 8000 10000 12000 14000 16000
#(Queries)
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ROC-AUC
AMDC
AMDC rand
AMDC pos only
AMDC no const
Nations
0 200 400 600 800 1000 1200 1400 1600
#(Queries)
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ROC-AUC
AMDC
AMDC rand
AMDC pos only
AMDC no const
UMLS
0 2000 4000 6000 8000 10000 12000 14000 16000
#(Queries)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ROC-AUC
AMDC
AMDC rand
AMDC pos only
AMDC no const
10 trials, (Q, q) = (105, 103) ((2 103,102) for Nations)
No neg-AUC
Random
Full AMDC
No constraints
/28
■ Conclusions
□ Manual RDF dataset construction is still demanding
• Some datasets require hand annotation by its nature
• Crowdsourcing provides an easy way of recruiting annotators
It's time to consider the manual construction problem!
□ AMDC = active learning + multi-relational learning
• RESCAL-based multi-relational learning
□ 3 key contributions lead to better performance
• Active learning significantly reduces the cost
• Constraints prevents overfitting
• Negative AUC-loss works better in case of skewed datasets
27
We consider manual annotation problems of the RDF data
/28
Thank you!
28

More Related Content

What's hot

Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
richardchandler
 
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Abhishek Thakur
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
Venkata Reddy Konasani
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
Syed Atif Naseem
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
recsysfr
 
Session 06 machine learning.pptx
Session 06 machine learning.pptxSession 06 machine learning.pptx
Session 06 machine learning.pptx
bodaceacat
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
Kevin Lee
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
Venkata Reddy Konasani
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
mooopan
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
recsysfr
 
Machine learning
Machine learningMachine learning
Machine learning
Sukhwinder Singh
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
Xiang Zhang
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
Venkata Reddy Konasani
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
Albert Bifet
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
Ramesh Sampath
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
Sri Ambati
 
Machine Learning - Unsupervised Learning
Machine Learning - Unsupervised LearningMachine Learning - Unsupervised Learning
Machine Learning - Unsupervised Learning
Giorgio Alfredo Spedicato
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
Majid Abdollahi
 
Machine Learning - Intro
Machine Learning - IntroMachine Learning - Intro
Machine Learning - Intro
Giorgio Alfredo Spedicato
 

What's hot (20)

Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
 
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Session 06 machine learning.pptx
Session 06 machine learning.pptxSession 06 machine learning.pptx
Session 06 machine learning.pptx
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Machine learning
Machine learningMachine learning
Machine learning
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
Machine Learning - Unsupervised Learning
Machine Learning - Unsupervised LearningMachine Learning - Unsupervised Learning
Machine Learning - Unsupervised Learning
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
Machine Learning - Intro
Machine Learning - IntroMachine Learning - Intro
Machine Learning - Intro
 

Similar to Active Learning for Multi-relational Data Construction

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Saad Elbeleidy
 
Algebra
AlgebraAlgebra
Algebra
himanshu211
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
Collin Bennett
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
Dr Arash Najmaei ( Phd., MBA, BSc)
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
Dai-Hai Nguyen
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
Qingsong Guo
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
AI Frontiers
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
AminaRepo
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
datamantra
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Sung Kim
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
Shrayes Ramesh
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
Bhaskar Mitra
 
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Thien Q. Tran
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
Mohamed Nassar
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
Bhaskar Mitra
 
Eye deep
Eye deepEye deep
Eye deep
sveitser
 

Similar to Active Learning for Multi-relational Data Construction (20)

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Algebra
AlgebraAlgebra
Algebra
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Eye deep
Eye deepEye deep
Eye deep
 

More from Hiroshi Kajino

Graph generation using a graph grammar
Graph generation using a graph grammarGraph generation using a graph grammar
Graph generation using a graph grammar
Hiroshi Kajino
 
化学構造式のためのハイパーグラフ文法(JSAI2018)
化学構造式のためのハイパーグラフ文法(JSAI2018)化学構造式のためのハイパーグラフ文法(JSAI2018)
化学構造式のためのハイパーグラフ文法(JSAI2018)
Hiroshi Kajino
 
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
Hiroshi Kajino
 
能動学習による多関係データセットの構築
能動学習による多関係データセットの構築能動学習による多関係データセットの構築
能動学習による多関係データセットの構築
Hiroshi Kajino
 
Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Instance-privacy Preserving Crowdsourcing (HCOMP2014)Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Hiroshi Kajino
 
Preserving Worker Privacy in Crowdsourcing
Preserving Worker Privacy in CrowdsourcingPreserving Worker Privacy in Crowdsourcing
Preserving Worker Privacy in Crowdsourcing
Hiroshi Kajino
 
プライバシ保護クラウドソーシング
プライバシ保護クラウドソーシングプライバシ保護クラウドソーシング
プライバシ保護クラウドソーシング
Hiroshi Kajino
 
20130716 aaai13-short
20130716 aaai13-short20130716 aaai13-short
20130716 aaai13-short
Hiroshi Kajino
 
20130605-JSAI2013
20130605-JSAI201320130605-JSAI2013
20130605-JSAI2013
Hiroshi Kajino
 

More from Hiroshi Kajino (10)

Graph generation using a graph grammar
Graph generation using a graph grammarGraph generation using a graph grammar
Graph generation using a graph grammar
 
化学構造式のためのハイパーグラフ文法(JSAI2018)
化学構造式のためのハイパーグラフ文法(JSAI2018)化学構造式のためのハイパーグラフ文法(JSAI2018)
化学構造式のためのハイパーグラフ文法(JSAI2018)
 
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
 
能動学習による多関係データセットの構築
能動学習による多関係データセットの構築能動学習による多関係データセットの構築
能動学習による多関係データセットの構築
 
Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Instance-privacy Preserving Crowdsourcing (HCOMP2014)Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Instance-privacy Preserving Crowdsourcing (HCOMP2014)
 
Preserving Worker Privacy in Crowdsourcing
Preserving Worker Privacy in CrowdsourcingPreserving Worker Privacy in Crowdsourcing
Preserving Worker Privacy in Crowdsourcing
 
プライバシ保護クラウドソーシング
プライバシ保護クラウドソーシングプライバシ保護クラウドソーシング
プライバシ保護クラウドソーシング
 
20130716 aaai13-short
20130716 aaai13-short20130716 aaai13-short
20130716 aaai13-short
 
20130605-JSAI2013
20130605-JSAI201320130605-JSAI2013
20130605-JSAI2013
 
20130304-DEIM2013
20130304-DEIM201320130304-DEIM2013
20130304-DEIM2013
 

Recently uploaded

23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Katherine Romanak - Geologic CO2 Storage.pdf
Katherine Romanak - Geologic CO2 Storage.pdfKatherine Romanak - Geologic CO2 Storage.pdf
Katherine Romanak - Geologic CO2 Storage.pdf
Texas Alliance of Groundwater Districts
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 

Recently uploaded (20)

23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Katherine Romanak - Geologic CO2 Storage.pdf
Katherine Romanak - Geologic CO2 Storage.pdfKatherine Romanak - Geologic CO2 Storage.pdf
Katherine Romanak - Geologic CO2 Storage.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 

Active Learning for Multi-relational Data Construction

  • 1. Active Learning for Multi-relational Data Construction Hiroshi Kajino1, Akihiro Kishimoto2, Adi Botea2 Elizabeth Daly2, Spyros Kotoulas2 1: The University of Tokyo, Japan, 2: IBM Research - Ireland 1
  • 2. /28 ■ Research focus: Manual RDF data construction □ Some data are difficult to extract automatically from docs Q: How can we efficiently construct the dataset by hands? ■ Our solution: Active learning + multi-relational learning □ Reduce the number of queries as much as possible 2 We develop a method to support hand RDF data annotation Multi-relational model Annotators1. Query labels of informative triples 2. Return labels 3. Update the dataset & retrain the model
  • 3. /28 ■ Outline □ Problem settings: • Multi-relational (RDF) data and their applications • Two formulations: – Dataset construction problem – Predictive model construction problem □ Our solution (AMDC): • Active learning • Multi-relational learning □ Experiments 3
  • 4. /28 ■ Outline □ Problem settings: • Multi-relational (RDF) data and their applications • Two formulations: – Dataset construction problem – Predictive model construction problem □ Our solution (AMDC): • Active learning • Multi-relational learning □ Experiments 4
  • 5. /28 ■ Multi-relational dataset (RDF format) □ Triple: t = (i, j, k) • Entity: i, j ∈ E • Relation: k ∈ R □ Label: • t is positive Entity i is in relation k with entity j • t is negative Entity i is not in relation k with entity j □ Multi-relational dataset: (Δp, Δn) Δp = {t ∈ Δ | t is positive}, Δn = {t ∈ Δ | t is negative} • Assume: |Δp| ≪ |Δ|, some triples remain unlabeled 5 Multi-relational dataset consists of binary-labeled triples Dog Animal Human is a part ofis the same as is a part of Set of all the triples
  • 6. /28 ■ Motivation of manual construction □ Knowledge base: Human knowledge encoded in RDF Point: Commonsense knowledge rarely appears in docs → Difficult to extract it automatically from documents □ Biological dataset: → Some unknown triples require experiments for labeling 6 Some RDF datasets require hand annotation by nature Dataset Positive triple example WordNet [Miller, 95] (dog, canine, synset), (dog, poodle, hypernym) ConceptNet [Liu+,04] (saxophone, jazz, UsedFor), (learn, knowledge, MotivatedByGoal) interact participate Protein DNA Cell cycle
  • 7. /28 ■ Two problem formulations □ Inputs: • Set of entities E, relations R, annotator O: Δ→{+1,-1} □ Problem 1: Dataset construction problem • Output: Positive triples Δp • Note: Positive triples are usually quite few □ Problem 2: Predictive model construction problem • Output: Multi-relational model M: Δ→R • Note: The model can predict labels of unlabeled triples ※ More direct formulation than Prob. 1 if the model is the goal 7 Two problem settings reflect different usages of a dataset ・ No error ・ B times access Degree of “positiveness”
  • 8. /28 ■ Outline □ Problem settings: • Multi-relational (RDF) data and their applications • Two formulations: – Dataset construction problem – Predictive model construction problem □ Our solution (AMDC): • Active learning • Multi-relational learning □ Experiments 8
  • 9. /28 ■ Active Multi-relational Data Construction □ Overview: 9 Our solution, AMDC, repeats learning and querying B times Multi-relational model Annotators1. Query labels of informative triples 2. Return labels 3. Update the dataset & retrain the model Training dataset (Δp, Δn) Train the model using the current training dataset
  • 10. /28 ■ Active Multi-relational Data Construction □ Overview: 10 Our solution, AMDC, repeats learning and querying B times Multi-relational model Annotators1. Query labels of informative triples 2. Return labels 3. Update the dataset & retrain the model Training dataset (Δp, Δn) AMDC is able to compute predictive score st (t ∈ Δu): Larger/smaller st model believes t is pos/neg
  • 11. /28 ■ Active Multi-relational Data Construction □ Overview: 11 Our solution, AMDC, repeats learning and querying B times Multi-relational model Annotators1. Query labels of informative triples 2. Return labels 3. Update the dataset & retrain the model Training dataset (Δp, Δn) Compute query score qt (t ∈ Δu) using st Smaller qt t is informative for dataset construction
  • 12. /28 ■ Active Multi-relational Data Construction □ Overview: 12 Our solution, AMDC, repeats learning and querying B times Multi-relational model Annotators1. Query labels of informative triples 2. Return labels 3. Update the dataset & retrain the model Training dataset (Δp, Δn)
  • 13. /28 ■ Active Multi-relational Data Construction □ Details: • Query scores qt • Multi-relational model, predictive score st 13 We explain the details of AMDC in 2 parts Multi-relational model Annotators1. Query labels of informative triples 2. Return labels 3. Update the dataset & retrain the model
  • 14. /28 ■ Active Multi-relational Data Construction □ Details: • Query scores qt • Multi-relational model, predictive score st 14 We explain the details of AMDC in 2 parts Multi-relational model Annotators1. Query labels of informative triples 2. Return labels 3. Update the dataset & retrain the model
  • 15. /28 ■ AMDC (1/2): Query scores □ Given: predictive score st, threshold 0 s.t. st > 0 (< 0) model believes t is positive (negative) □ Query score qt (t ∈ Δ): Query the label on triples {t} w/ smallest qt • Positiveness score (for Problem 1): qt := - st Choose triples the model believes to be positive • Uncertainty score (for Problem 2): qt = |st | Choose triples that the model is uncertain ※ AMDC handles two problems just by switching the query score 15 We employ two different query scores for the two problems pos neg st 0
  • 16. /28 ■ Active Multi-relational Data Construction □ Details: • Query scores qt • Multi-relational model, predictive score st 16 We explain the details of AMDC in 2 parts Multi-relational model Annotators1. Query labels of informative triples 2. Return labels 3. Update the dataset & retrain the model
  • 17. /28 ■ AMDC (2/2): Multi-relational model □ RESCAL [Nickel+,11]: • Model: – ai ∈ RD : Latent vector of entity i – Rk ∈ RD D : Latent matrix of relation k • Predictive score: st = ai T Rk aj Large/small st t is likely to be positive/negative □ Additional constraints: |ai| = 1, Rk = rotation matrix • Reduce the degree of freedom • Stabilize learning in case of small labels (at the beginning) (→ experiments) 17 We add two constraints to RESCAL to avoid overfitting New
  • 18. /28 ■ AMDC (2/2): Optimization problem for learning 18 Pros Cons pos AUC-loss s(pos) > s(non-pos) - Robust to pos/neg ratio - Unlabeled triples are used - Neg is not explicitly used - No threshold for pos/neg neg AUC-loss s(non-neg) > s(neg) Neg triples are explicitly used (→ experiments) No threshold between pos/neg Classification error s(pos) > 0 s(neg) < 0 - Threshold between pos/neg → Able to compute the uncertainty score - Non-robust to pos/neg ratio - Difficult to use unlabeled triples Two objective functions are added to overcome the cons min pos AUC-loss + neg AUC-loss + classification loss New New pos st unlabeled neg
  • 19. /28 ■ AMDC (2/2): Optimization problem for learning 19 Pros Cons pos AUC-loss s(pos) > s(non-pos) - Robust to pos/neg ratio - Unlabeled triples are used - Neg is not explicitly used - No threshold for pos/neg neg AUC-loss s(non-neg) > s(neg) Neg triples are explicitly used (→ experiments) No threshold between pos/neg Classification error s(pos) > 0 s(neg) < 0 - Threshold between pos/neg → Able to compute the uncertainty score - Non-robust to pos/neg ratio - Difficult to use unlabeled triples Two objective functions are added to overcome the cons min pos AUC-loss + neg AUC-loss + classification loss New New neg st pos unlabeled pos st unlabeled neg +
  • 20. /28 ■ AMDC (2/2): Optimization problem for learning 20 Pros Cons pos AUC-loss s(pos) > s(non-pos) - Robust to pos/neg ratio - Unlabeled triples are used - Neg is not explicitly used - No threshold for pos/neg neg AUC-loss s(non-neg) > s(neg) Neg triples are explicitly used (→ experiments) No threshold between pos/neg Classification error s(pos) > 0 s(neg) < 0 - Threshold between pos/neg → Able to compute the uncertainty score - Non-robust to pos/neg ratio - Difficult to use unlabeled triples Two objective functions are added to overcome the cons min pos AUC-loss + neg AUC-loss + classification loss New New neg st pos unlabeled pos st unlabeled neg pos neg st unlabeled = +
  • 21. /28 ■ AMDC (2/2): Optimization problem for learning 21 Pros Cons pos AUC-loss s(pos) > s(non-pos) - Robust to pos/neg ratio - Unlabeled triples are used - Neg is not explicitly used - No threshold for pos/neg neg AUC-loss s(non-neg) > s(neg) Neg triples are explicitly used (→ experiments) No threshold between pos/neg Classification loss s(pos) > 0 s(neg) < 0 - Threshold between pos/neg → Able to compute the uncertainty score - Non-robust to pos/neg ratio - Difficult to use unlabeled triples Two objective functions are added to overcome the cons min pos AUC-loss + neg AUC-loss + classification loss New New pos neg st unlabeled0
  • 22. /28 ■ AMDC (2/2): Optimization problem □ Algorithm: Stochastic gradient descent □ Parameters: □ Hyperparameters: γ, γ’, Cn, Ce, D At each iteration, we choose the best model by using a val set 22 Margin-based loss functions are optimized using SGD s(pos) > s(non-pos) s(non-neg) > s(neg) s(pos) > 0 s(neg) < 0 ,
  • 23. /28 ■ Outline □ Problem settings: • Multi-relational (RDF) data and their applications • Two formulations: – Dataset construction problem – Predictive model construction problem □ Our solution (AMDC): • Active learning • Multi-relational learning □ Experiments 23
  • 24. /28 ■ Experiments □ Purpose: Evaluate 3 contributions of AMDC in two problems • Query scores (vs. AMDC + random query) • Constraints on RESCAL (vs. AMDC - constraints) • neg AUC-loss (vs. AMDC - neg-AUC) □ Datasets: • Annotators are simulated 24 We evaluate 3 modifications using partial AMDCs #(Entity) #(Relation) #(Pos) #(Neg) Kinships [Denham, 73] 104 26 10,790 270,426 Nations [Rummel, 50-65] 125 57 2,565 8,626 UMLS [McCray,03] 135 49 6,752 886,273
  • 25. /28 ■ Experiments (1/2): Dataset construction problem Score: %(pos triples collected by AMDC) □ AMCD shows 2.4 – 19 times improvements over Random □ Negative triples are helpful when they are abundant (K, U) □ Effects of the constraints are incremental 25 AMDC has collected 2.4 – 19 times as many positive triples as baselines 10 trials, (Q, q) = (105, 103) ((2 103,102) for Nations) Nations 0 200 400 600 800 1000 1200 1400 1600 #(Queries) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Completionrate AMDC AMDC rand AMDC pos only AMDC no const UMLS 0 2000 4000 6000 8000 10000 12000 14000 16000 #(Queries) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Completionrate AMDC AMDC rand AMDC pos only AMDC no const Kinships 0 2000 4000 6000 8000 10000 12000 14000 16000 #(Queries) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Completionrate AMDC AMDC rand AMDC pos only AMDC no const No neg-AUC Random Full AMDC No constraints
  • 26. /28 ■ Experiments (2/2): Predictive model construction problem Score: ROC-AUC □ AMDC often achieves better AUC than Random (K, U) □ Negative triples are also helpful to improve ROC-AUC □ Constraints work to prevent overfitting 26 AMDC has achieved the best predictive score Kinships 0 2000 4000 6000 8000 10000 12000 14000 16000 #(Queries) 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ROC-AUC AMDC AMDC rand AMDC pos only AMDC no const Nations 0 200 400 600 800 1000 1200 1400 1600 #(Queries) 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ROC-AUC AMDC AMDC rand AMDC pos only AMDC no const UMLS 0 2000 4000 6000 8000 10000 12000 14000 16000 #(Queries) 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ROC-AUC AMDC AMDC rand AMDC pos only AMDC no const 10 trials, (Q, q) = (105, 103) ((2 103,102) for Nations) No neg-AUC Random Full AMDC No constraints
  • 27. /28 ■ Conclusions □ Manual RDF dataset construction is still demanding • Some datasets require hand annotation by its nature • Crowdsourcing provides an easy way of recruiting annotators It's time to consider the manual construction problem! □ AMDC = active learning + multi-relational learning • RESCAL-based multi-relational learning □ 3 key contributions lead to better performance • Active learning significantly reduces the cost • Constraints prevents overfitting • Negative AUC-loss works better in case of skewed datasets 27 We consider manual annotation problems of the RDF data

Editor's Notes

  1. Our research focus is the manual RDF data construction The reason why we focus on the “manual” construction, rather than the “automatic” construction is that some data are difficult to extract automatically from documents So, we set our research question as “How can we support the human annotators?” Our solution to this research question is to combine active learning and multi-relational learning techniques We use a model of the RDF dataset called “a multi-relational model” to reduce the number of queries as much as possible
  2. We first define the multi-relational dataset in the RDF format A multi-relational dataset consists of binary-labeled triples A triple is made up of two “entities” i & j, and one “relation” k. In this example, a dog, a human, and an animal are entities, and “is_a_part_of” and “is_the_same_as” are relations. We assign a positive label to triple t if “entity i is in relation k with entity j”, and a negative label otherwise. In this example, as a dog is a part of an animal, this triple is positive, but as a dog is not the same as a human, this triple is negative. A multi-relational dataset is defined as sets of positive and negative triples. Here, we assume that positive triples are much fewer than all the triples, and we allow some triples are not labeled.
  3. Then, we give two examples to motivate our manual construction problems. The first example is a knowledge base, which encodes human knowledge in the RDF format. A famous dataset is the WordNet, which represents the relations between words. Another example is the ConceptNet, which represents commonsense knowledge. The point here is that commonsense knowledge rarely appears in documents. So it is difficult to extract such a dataset automatically from documents. The second example is a biological dataset. For example, interactions between proteins and DNAs and the participation of chemical compounds to a biological mechanism such as a cell cycle. To label a triple, researchers have to conduct experiments, and therefore, biological datasets need hand annotation to construct.
  4. Finally, I’m going to state the formal problem settings of manual dataset construction. We notice that there are two problem settings according to the usages of the dataset. The first problem setting is called a “dataset construction problem”. The goal of this problem setting is to collect as many positive triples as possible. If the goal is to obtain a dataset, it is sufficient to collect the positive triples. The second problem setting is called a “predictive model construction problem”. The goal of this problem setting is to learn a multi-relational model, which predicts labels of unlabeled triples
  5. First of all, I will show you the overview of AMDC. Given the initial training dataset, AMDC trains the multi-relational model using the current dataset.
  6. Then, AMDC is able to compute predictive scores to unlabeled triples Larger score means that the model believes the triple is likely to be positive.
  7. The model computes query scores. Smaller query score means the triple is informative for the model
  8. Based on the query scores, the model chooses informative triples and query labels of them. Then, annotators return labels of them, and the model updates the dataset and retrain the model using the updated dataset. AMDC repeats this procedure B times and finally output the model and the dataset.
  9. Then, I’m going to present the details of AMDC.
  10. Query scores are used to choose informative triples. We design two query scores for the two problem settings. The query scores are computed based on predictive scores given by the multi-relational model. We assume that the predictive score has threshold 0 to discriminate positive and negative triples. The first query score called a “positiveness score” is designed for Problem 1. It chooses triples the model believes to be positive. The second score called an “uncertainty score” is designed for Problem 2, which chooses triples that the model is uncertain. AMDC handles the two problem settings just by switching this query score, and therefore, the other parts of AMDC are common between the two problem settings.
  11. A multi-relational model of AMDC is based on RESCAL. RESCAL models each entity as a latent vector and each relation as a latent matrix. The predictive score of RESCAL is written in this form. The model is trained so that the larger score indicates the triple is likely to be positive. We introduce additional constraints to RESCAL in order to stabilize learning in case of small samples. Specifically, the latent vectors are restricted to the unit ball, and the latent matrices are restricted to rotation matrices. We confirm the effect of adding these constraints in experiments.
  12. Our model is trained using this optimization problem. The first term is a typical AUC loss function, which induces the predictive score of a positive triple is larger than that of a non-positive triple. As this objective function is robust to the positive-negative ratio of a training dataset, this objective function is often used to learn a multi-relational data model. However, we find two issues in this objective function. The first issue is that this AUC loss function does not distinguish negative triples from unlabeled triples. In order to use the negative triples effectively, we combine a negative part of the AUC loss function, which induces the score of a non-negative triple is larger than the score of a negative triple. As a result, we can effectively use both positive and negative triples. The effect of adding this objective function is also checked in the experiments. The second issue is that the AUC loss functions cannot learn the threshold to discriminate positive and negative triples, which is necessary to compute query scores. So we add the classification error function to calibrate the scores to have threshold 0.
  13. Our model is trained using this optimization problem. The first term is a typical AUC loss function, which induces the predictive score of a positive triple is larger than that of a non-positive triple. As this objective function is robust to the positive-negative ratio of a training dataset, this objective function is often used to learn a multi-relational data model. However, we find two issues in this objective function. The first issue is that this AUC loss function does not distinguish negative triples from unlabeled triples. In order to use the negative triples effectively, we combine a negative part of the AUC loss function, which induces the score of a non-negative triple is larger than the score of a negative triple. As a result, we can effectively use both positive and negative triples. The effect of adding this objective function is also checked in the experiments. The second issue is that the AUC loss functions cannot learn the threshold to discriminate positive and negative triples, which is necessary to compute query scores. So we add the classification error function to calibrate the scores to have threshold 0.
  14. Our model is trained using this optimization problem. The first term is a typical AUC loss function, which induces the predictive score of a positive triple is larger than that of a non-positive triple. As this objective function is robust to the positive-negative ratio of a training dataset, this objective function is often used to learn a multi-relational data model. However, we find two issues in this objective function. The first issue is that this AUC loss function does not distinguish negative triples from unlabeled triples. In order to use the negative triples effectively, we combine a negative part of the AUC loss function, which induces the score of a non-negative triple is larger than the score of a negative triple. As a result, we can effectively use both positive and negative triples. The effect of adding this objective function is also checked in the experiments. The second issue is that the AUC loss functions cannot learn the threshold to discriminate positive and negative triples, which is necessary to compute query scores. So we add the classification error function to calibrate the scores to have threshold 0.
  15. Our model is trained using this optimization problem. The first term is a typical AUC loss function, which induces the predictive score of a positive triple is larger than that of a non-positive triple. As this objective function is robust to the positive-negative ratio of a training dataset, this objective function is often used to learn a multi-relational data model. However, we find two issues in this objective function. The first issue is that this AUC loss function does not distinguish negative triples from unlabeled triples. In order to use the negative triples effectively, we combine a negative part of the AUC loss function, which induces the score of a non-negative triple is larger than the score of a negative triple. As a result, we can effectively use both positive and negative triples. The effect of adding this objective function is also checked in the experiments. The second issue is that the AUC loss functions cannot learn the threshold to discriminate positive and negative triples, which is necessary to compute query scores. So we add the classification error function to calibrate the scores to have threshold 0.
  16. As a result, we obtain this optimization problem to learn the model. We use a stochastic gradient descent algorithm to solve the problem. We choose the best set of hyperparameters using a validation set.
  17. The main purpose of our experiments is to evaluate 3 contributions of AMDC in both problem settings. The contributions are..., the query scores, the constraints on RESCAL, and the negative part of the AUC loss function. We cut each part of AMDC to create three competing methods. We use three datasets. Annotators are simulated using the labels of these datasets.
  18. The first experiment handles the dataset construction problem. The score here is the percentage of positive triples collected by AMDC. The x-axis of the charts is the number of queries, and the y-axis is the score. We find that AMDC improves the score the best against the random one in the UMLS dataset. Therefore, we confirm that AMDC is robust to the positive-negative ratio, because the UMLS dataset is the most skewed dataset. Also, the negative part of the AUC loss function is helpful when there are many negative triples. However, the additional constraints are not so effective in this context.
  19. The second experiment handles the predictive model construction problem. The score is the Area Under the ROC curve. We first find that the full AMDC always achieves the best ROC-AUC. Kinships and UMLS datasets show significant improvements over the random strategy. In this problem setting, both the negative part of the AUC loss and the constraints work to improve the performance. Therefore, we conclude that these two modifications work positively for this problem setting.