Link Discovery Tutorial
Part II: Accuracy
Axel-Cyrille Ngonga Ngomo(1)
, Irini Fundulaki(2)
, Mohamed Ahmed Sherif(1)
(1) Institute for Applied Informatics, Germany
(2) FORTH, Greece
October 18th, 2016
Kobe, Japan
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 1 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 2 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 3 / 54
Introduction
Link Discovery as Classification Task
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54
Introduction
Link Discovery as Classification Task
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Definition (Classification perspective)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : C(s, t) = +1}
Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54
Introduction
Link Discovery as Classification Task
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Definition (Classification perspective)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : C(s, t) = +1}
Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ
Classical machine learning problem [Ngo+11; NL12]
Dedicated techniques perform better
Unsupervised, active and unsupervised techniques possible
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54
Introduction
Challenge
Challenges
1 Creation of labeled training data tedious
2 Need automated means for automatic class and property matching
3 Need for efficient execution of link specifications
4 Dedicated machine learning approaches necessary
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54
Introduction
Challenge
Challenges
1 Creation of labeled training data tedious
2 Need automated means for automatic class and property matching
3 Need for efficient execution of link specifications
4 Dedicated machine learning approaches necessary
Solutions
1 Use active learning approach for link discovery
2 Rely on hospital/resident algorithm
3 See previous section
4 Topic of this section
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 6 / 54
RAVEN
Approach
Definition (Classification perspective)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : C(s, t) = +1}
Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54
RAVEN
Approach
Definition (Classification perspective)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : C(s, t) = +1}
Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ
Learning classifier C involves learning
1 Two sets of restrictions that specify the sets S resp. T,
2 the components σ1 . . . σn of a complex similarity measure σ
3 a set of thresholds θ1, ..., θn for σ1, . . . , σn
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54
RAVEN
Approach
Definition (Classification perspective)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : C(s, t) = +1}
Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ
Learning classifier C involves learning
1 Two sets of restrictions that specify the sets S resp. T,
2 the components σ1 . . . σn of a complex similarity measure σ
3 a set of thresholds θ1, ..., θn for σ1, . . . , σn
Assumptions
Restrictions are class restrictions
Classifier shape is given (e.g., linear combination)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54
RAVEN
Approach
Class and Property Restrictions
Define class similarity function
Solve corresponding hospital-resident problem
Based on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54
RAVEN
Approach
Class and Property Restrictions
Define class similarity function
Solve corresponding hospital-resident problem
Based on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54
RAVEN
Approach
Class and Property Restrictions
Define class similarity function
Solve corresponding hospital-resident problem
Based on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 9 / 54
RAVEN
Approach
Class and Property Restrictions
Define class similarity function
Solve corresponding hospital-resident problem
Based on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 10 / 54
RAVEN
Approach
Class and Property Restrictions
Define class similarity function
Solve corresponding hospital-resident problem
Based on extension of stable marriage problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 11 / 54
RAVEN
Approach
Class Restrictions
Similarity function
String similarity
Number of shared property values amongst instances
. . .
Solve corresponding hospital-resident problem
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54
RAVEN
Approach
Class Restrictions
Similarity function
String similarity
Number of shared property values amongst instances
. . .
Solve corresponding hospital-resident problem
Source Target S T
Drugbank Disesome Targets Genes
Sider Diseasome Side-Effect Diseases
DBpedia Dailymed Organization Organization
Sider Dailymed Drugs Offer
Drugbank DBpedia Targets Protein
Property mapping similar
Leads to σ1 . . . σn
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54
RAVEN
Approach
Learning Threshold
Active perceptron learning
Begin with educated guess, e.g., θi = 0.9
Update thresholds based on most informative examples
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54
RAVEN
Approach
Learning Threshold
Active perceptron learning
Begin with educated guess, e.g., θi = 0.9
Update thresholds based on most informative examples
Guess initial classifier
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54
RAVEN
Approach
Learning Threshold
Active perceptron learning
Begin with educated guess, e.g., θi = 0.9
Update thresholds based on most informative examples
Pick most informative examples, i.e., unclassified and closest to boundary
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 14 / 54
RAVEN
Approach
Learning Threshold
Active perceptron learning
Begin with educated guess, e.g., θi = 0.9
Update thresholds based on most informative examples
Ask for classification from oracle
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 15 / 54
RAVEN
Approach
Learning Threshold
Active perceptron learning
Begin with educated guess, e.g., θi = 0.9
Update thresholds based on most informative examples
Update classifier
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 16 / 54
RAVEN
Evaluation
Evaluation on Diseases (Diseasome to DBpedia)
Learning rate = 0.02
10 questions/iteration
F-measure of up to 92%
1 3 5 7 9 11 13 15 17 19 21 23 25
Number of iterations
0
10
20
30
40
50
60
70
80
90
100
P (%)
R (%)
F (%)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 17 / 54
RAVEN
Evaluation
Learning rate = 0.02
10 questions/iteration
1 3 5 7 9 11 13 15 17 19 21 23 25
Number of iterations
10
100
1000
Runtime(ms)
Diseases
Drugs
Side Effects
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 18 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 19 / 54
Eagle
Efficient Active Learning of Link Specifications using Genetic Programming
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 20 / 54
Eagle
Efficient Active Learning of Link Specifications using Genetic Programming
Eagle
Provides means for automatic class and property matching
Minimizes human labeling effort through active learning
Allow for learning generic specs (limitation of RAVEN)
Similar approaches [NIK+12; ISE+12]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 21 / 54
Eagle
Formal Definition
Same formal setting as RAVEN
Two sets of restrictions resp. that specify the sets S resp. T,
a specification of mapping properties (p1, q1), . . . , (pn, qn) for the elements of
S and T and
a specification of a complex similarity measure σ as the combination of several
atomic similarity measures σ1, . . . , σn and of a set of thresholds θ1, . . . , θn such
that θi is the threshold for σi .
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 22 / 54
Eagle
LS example
Can learn generic classifier type
f (levenshtein(:title, :title), 0.53)
f (cosine(:venue, :year), 1.00)
f (jaccard(:title, :authors), 0.43)
f (trigrams(:title, :year), 1.00)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 23 / 54
Eagle
Idea & Goal
Eagle
Idea: Specifications are trees
Goal: Learn elements of trees through genetic operations until best LS is
found
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 24 / 54
Eagle Algorithm
Step 1: Generate initial population
Random process (property pairs, thresholds)
Compute fitness
Fitness = F-Measure w.r.t known data
(m1, θ1)
p1 q1
(m2, θ2)
p2 q2
(m3, θ3)
p3 q3
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 25 / 54
Eagle Algorithm
Step 2: Evolve population
Tournament between two individuals
Two operators: Mutation and crossover
(m1, θ1)
p1 q1
(m3, θ3)
p2 q2
(m2, θ2)
p3 q3
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle Algorithm
Step 2: Evolve population
Tournament between two individuals
Two operators: Mutation and crossover
(m1, θ1)
p1 q1
(m3, θ3)
p2 q2
(m2, θ2)
p3 q3
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle Algorithm
Step 2: Evolve population
Tournament between two individuals
Two operators: Mutation and crossover
(m1, θ1)
p1 q1
(m3, θ3)
p2 q2
(m2, θ2)
p3 q3
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle Algorithm
Step 2: Evolve population
Tournament between two individuals
Two operators: Mutation and crossover
p1 q1
(m1, θ1 + α) (m3, θ3)
p2 q2
(m2, θ2)
p3 q3
(m4, θ4) (m5, θ5)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle Algorithm
Step 2: Evolve population
Tournament between two individuals
Two operators: Mutation and crossover
p1 q1
(m1, θ1 + α) (m3, θ3)
p2 q2
(m2, θ2)
p3 q3
(m4, θ4)
p3 q3
(m2, θ2)
p3 q3
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
Eagle Algorithm
Step 3: Computation of most informative links
Previous approaches define amount of information of link as closeness to the
decision boundary
Here, use disagreement amongst elements of population of size n
δ((s, t)) = (n − |Mt
i : (s, t) ∈ Mi )|)(n − |Mt
i : (s, t) /∈ Mi |)
Function is maximal when n
2 count (s, t) as positive and n
2 as negative
Can be modeled with other functions such as entropy
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 27 / 54
Eagle Algorithm
Step 4: Active Learning
Compute d((s, t)) for all (s, t) returned by a LS
Pick k most informative
Require labeling from user
Update list of positive and negative examples
(m1, θ1 + α)
p1 q1
(m2, θ2)
p3 q3
(m3, θ3)
p2 q2
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 28 / 54
Eagle Algorithm
Step 5: Remove least fit elements
Fitness = F-Measure w.r.t known data
(m1, θ1 + α)
p1 q1
(m2, θ2)
p3 q3
(m3, θ3)
p2 q2
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54
Eagle Algorithm
Step 5: Remove least fit elements
Fitness = F-Measure w.r.t known data
(m1, θ1 + α)
p1 q1
(m2, θ2)
p3 q3
(m3, θ3)
p2 q2
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54
Eagle Algorithm
Step 5: Remove least fit elements
Fitness = F-Measure w.r.t known data
(m1, θ1 + α)
p1 q1
(m2, θ2)
p3 q3
(m3, θ3)
p2 q2
(m4, θ4) (m2, θ2)
p3 q3 p2 q2
If termination conditions not met, goto Step 2
Else terminate and pick fittest LS
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54
Eagle Algorithm
Unsupervised learning
Measure degree of monogamy of links [NIK+12]
Only works for 1-1 relations, e.g., owl:sameAs
P(M) =
|{s|∃t : (s, t) ∈ M}|
s
|{t : (s, t) ∈ M}|
, R(M) =
|{t|∃s : (s, t) ∈ M}|
t
|{s : (s, t) ∈ M}|
. (1)
Fβ
(M) = (1 + β2
)
Pd (M)Rd (M)
β2Pd (M) + Rd (M)
(2)
s1
s2
s3
t1
t2
t3
t4
link
link
link
link
P = 3/4, R = 2/4, F = 3/5.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 30 / 54
Eagle
Experiments and Results
Experimental Setup:
Compared batch learning and genetic programming
Used 3 different data sets
1 Dailymed-Drugbank (LATC)
2 DBpedia-LinkedMDB (LATC)
3 DBLP-ACM
Compared different sizes of population (20,100)
Compared random annotation with active learning
Mutation and crossover rates = 0.6
Maximal number of iterations = 50
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 31 / 54
Eagle
Experiments and Results
Dailymed-Drugbank
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 32 / 54
Eagle
Experiments and Results
DBpedia-LinkedMDB
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 33 / 54
Eagle
Experiments and Results
DBLP-ACM
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 34 / 54
Eagle
Experiments and Results
Larger population leads to
Better results, yet
Longer runtimes
For most datasets, population size of 100 seems sufficient for most linked
data sets
EAGLE is more time-efficient than state of the art
337s for ACM-DBLP (n=100) vs.
1553s for Marlin (ADTree)
2196s for Marlin (SVM)
4320s for Febrl (SVM)
Active learning clearly outperforms random annotation
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 35 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 36 / 54
Coala
Correlation-Aware Active Learning of Link Specifications
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 37 / 54
Coala
Learning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54
Coala
Learning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54
Coala
Learning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54
Coala
Learning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54
Coala
Learning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54
Coala
Learning Complex Specifications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54
Coala
Learning Complex Specifications
Insight
Choice of right example is key for learning
So far, only use of informativeness
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54
Coala
Learning Complex Specifications
Insight
Choice of right example is key for learning
So far, only use of informativeness
Question
Can we do better by using more information?
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54
Coala
Learning Complex Specifications
Insight
Choice of right example is key for learning
So far, only use of informativeness
Question
Can we do better by using more information?
Higher F-measure
Often slower
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54
Coala Approach
Basic Idea
Use similarity of link candidates when selecting most informative examples
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54
Coala Approach
Basic Idea
Use similarity of link candidates when selecting most informative examples
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54
Coala Approach
Basic Idea
Use similarity of link candidates when selecting most informative examples
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54
Coala
Similarity of Candidates
Link candidate x = (s, t) can be regarded as vector
(σ1(x), . . . , σn(x)) ∈ [0, 1]n
.
Similarity of link candidates x and y:
sim(x, y) =
1
1 +
n
i=1
(σi (x) − σi (y))2
. (3)
Allows exploiting both intra- and inter-class similarity
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 42 / 54
Coala
Graph Clustering
Rationale: Use intra-class similarity
Approach
Cluster elements of S+
and S−
independently
Choose one element per cluster as representative
Present oracle with most informative representatives
0.8
0.9
0.8
S+
S-
0.8
0.9
0.8
0.25
0.25
0.9
0.8
0.8
0.8
0.25
a
b
c
d
e
d
f g
h
i
k
l
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 43 / 54
Coala
BorderFlow
G = (V , E, ω) with V = S+
or V = S−
ω(x, y) = sim(x, y)
Keep best ec edges for each x ∈ V
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 44 / 54
Coala
BorderFlow
Seed-based algorithm
Goal: Maximize borderflow ratio bf (X) = Ω(b(X),X)
Ω(b(X),n(X))
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54
Coala
BorderFlow
Seed-based algorithm
Goal: Maximize borderflow ratio bf (X) = Ω(b(X),X)
Ω(b(X),n(X))
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54
Coala
BorderFlow
Seed-based algorithm
Goal: Maximize borderflow ratio bf (X) = Ω(b(X),X)
Ω(b(X),n(X))
http://sourceforge.net/projects/cugar-framework/
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54
Coala
BorderFlow
Seed-based algorithm
Goal: Maximize borderflow ratio bf (X) = Ω(b(X),X)
Ω(b(X),n(X))
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54
Coala
BorderFlow
Seed-based algorithm
Goal: Maximize borderflow ratio bf (X) = Ω(b(X),X)
Ω(b(X),n(X))
http://sourceforge.net/projects/cugar-framework/
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54
Coala
Spreading Activation
Rationale: Use both inter- and intra-class similarity
Approach
M0 : mij = sim(xi , xj ) with (xi , xj ) ∈ (S+
∪ S−
)2
A0 : ai = ifm(xi )
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54
Coala
Spreading Activation
Rationale: Use both inter- and intra-class similarity
Approach
M0 : mij = sim(xi , xj ) with (xi , xj ) ∈ (S+
∪ S−
)2
A0 : ai = ifm(xi )
At = At−1 + Mt−1At−1 (spread activation)
At = At / max(At ) (normalize)
Mt = M
r
t−1 (weight decay)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54
Coala
Spreading Activation
Rationale: Use both inter- and intra-class similarity
Approach
M0 : mij = sim(xi , xj ) with (xi , xj ) ∈ (S+
∪ S−
)2
A0 : ai = ifm(xi )
At = At−1 + Mt−1At−1 (spread activation)
At = At / max(At ) (normalize)
Mt = M
r
t−1 (weight decay)
3 iterations 3.9*10-3
0.97
0.691
0.73
1.5*10-5
3.9*10-3
1.5*10-5
3.9*10-3
S+
S-
0.8
0.80.9
0.9
0.25
0.5 0.5
0.25
0.5
S+
S-
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54
Coala Evaluation
Experimental Setup
Used EAGLE as active learning approach
Mutation and crossover rate = 0.6
Selection rate = 0.7
Not deterministic ⇒ Ran each experiment 5 times
5 queries to oracle per iteration
10 iterations overall
2 populations sizes: 20 and 100
50 generations between iterations
Two real-world and three synthetic datasets
Single thread of a server (JDK1.7, Ubuntu 10.0.4, AMD Opteron 2GHz,
2GB/Experiment)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 48 / 54
Coala Evaluation
Parameters for WD
Ran experiments on DBLP-ACM
Population = 20
r ∈ {2, 4, 8, 16, 32}
10 20 30 40 50 60 70 80 90 100
0.5
0.6
0.7
0.8
0.9
1
F-score
0
200
400
600
800
1,000
runtimeinseconds
f(2) f(4) f(8) f(16) f(32)
d(2) d(4) d(8) d(16) d(32)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 49 / 54
Coala Evaluation
Parameters for CL
Ran experiments on DBLP-ACM
Population = 20
ec ∈ {1, 2, 3, 4, 5}
10 20 30 40 50 60 70 80 90 100
0.5
0.6
0.7
0.8
0.9
1
F-score
0
500
1,000
1,500
2,000
runtimeinseconds
f(1) f(2) f(3) f(4) f(5)
d(1) d(2) d(3) d(4) d(5)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 50 / 54
Coala Evaluation
F-Scores
Population = 100, final values
Better results, yet unclear when to use WD or CL
DataSet EAGLE WD CL
Abt 0.19±0.04 0.25±0.04 0.23±0.04
DBLP 0.91±0.03 0.96±0.01 0.96±0.02
Person1 0.86±0.02 0.89±0.01 0.81±0.18
Person2 0.74±0.03 0.71±0.08 0.77±0.03
Restaurant 0.89±0.0 0.86±0.02 0.89±0.0
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 51 / 54
Table of Contents
1 Introduction
2 Raven
3 Eagle
4 Coala
5 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 52 / 54
Summary and Conclusion
Large number of challenges to
learning accurate specifications
1 Reduce labeling effort
⇒ Active learning
2 Learn complex specifications
⇒ Genetic programming
3 Learn specifications efficienty
⇒ See previous slides
Challenges include
1 Determinism
2 Deep learning
3 Self-checking
4 . . .
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 53 / 54
Acknowledgment
This work was supported by grants from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 54 / 54

Link Discovery Tutorial Part II: Accuracy

  • 1.
    Link Discovery Tutorial PartII: Accuracy Axel-Cyrille Ngonga Ngomo(1) , Irini Fundulaki(2) , Mohamed Ahmed Sherif(1) (1) Institute for Applied Informatics, Germany (2) FORTH, Greece October 18th, 2016 Kobe, Japan Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 1 / 54
  • 2.
    Table of Contents 1Introduction 2 Raven 3 Eagle 4 Coala 5 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 2 / 54
  • 3.
    Table of Contents 1Introduction 2 Raven 3 Eagle 4 Coala 5 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 3 / 54
  • 4.
    Introduction Link Discovery asClassification Task Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54
  • 5.
    Introduction Link Discovery asClassification Task Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Definition (Classification perspective) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : C(s, t) = +1} Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54
  • 6.
    Introduction Link Discovery asClassification Task Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Definition (Classification perspective) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : C(s, t) = +1} Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ Classical machine learning problem [Ngo+11; NL12] Dedicated techniques perform better Unsupervised, active and unsupervised techniques possible Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54
  • 7.
    Introduction Challenge Challenges 1 Creation oflabeled training data tedious 2 Need automated means for automatic class and property matching 3 Need for efficient execution of link specifications 4 Dedicated machine learning approaches necessary Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54
  • 8.
    Introduction Challenge Challenges 1 Creation oflabeled training data tedious 2 Need automated means for automatic class and property matching 3 Need for efficient execution of link specifications 4 Dedicated machine learning approaches necessary Solutions 1 Use active learning approach for link discovery 2 Rely on hospital/resident algorithm 3 See previous section 4 Topic of this section Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54
  • 9.
    Table of Contents 1Introduction 2 Raven 3 Eagle 4 Coala 5 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 6 / 54
  • 10.
    RAVEN Approach Definition (Classification perspective) Givensets S and T of resources and relation R Find M = {(s, t) ∈ S × T : C(s, t) = +1} Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54
  • 11.
    RAVEN Approach Definition (Classification perspective) Givensets S and T of resources and relation R Find M = {(s, t) ∈ S × T : C(s, t) = +1} Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ Learning classifier C involves learning 1 Two sets of restrictions that specify the sets S resp. T, 2 the components σ1 . . . σn of a complex similarity measure σ 3 a set of thresholds θ1, ..., θn for σ1, . . . , σn Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54
  • 12.
    RAVEN Approach Definition (Classification perspective) Givensets S and T of resources and relation R Find M = {(s, t) ∈ S × T : C(s, t) = +1} Here, C(s, t) = +1 ↔ σ(s, t) ≥ θ Learning classifier C involves learning 1 Two sets of restrictions that specify the sets S resp. T, 2 the components σ1 . . . σn of a complex similarity measure σ 3 a set of thresholds θ1, ..., θn for σ1, . . . , σn Assumptions Restrictions are class restrictions Classifier shape is given (e.g., linear combination) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54
  • 13.
    RAVEN Approach Class and PropertyRestrictions Define class similarity function Solve corresponding hospital-resident problem Based on extension of stable marriage problem Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54
  • 14.
    RAVEN Approach Class and PropertyRestrictions Define class similarity function Solve corresponding hospital-resident problem Based on extension of stable marriage problem Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54
  • 15.
    RAVEN Approach Class and PropertyRestrictions Define class similarity function Solve corresponding hospital-resident problem Based on extension of stable marriage problem Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 9 / 54
  • 16.
    RAVEN Approach Class and PropertyRestrictions Define class similarity function Solve corresponding hospital-resident problem Based on extension of stable marriage problem Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 10 / 54
  • 17.
    RAVEN Approach Class and PropertyRestrictions Define class similarity function Solve corresponding hospital-resident problem Based on extension of stable marriage problem Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 11 / 54
  • 18.
    RAVEN Approach Class Restrictions Similarity function Stringsimilarity Number of shared property values amongst instances . . . Solve corresponding hospital-resident problem Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54
  • 19.
    RAVEN Approach Class Restrictions Similarity function Stringsimilarity Number of shared property values amongst instances . . . Solve corresponding hospital-resident problem Source Target S T Drugbank Disesome Targets Genes Sider Diseasome Side-Effect Diseases DBpedia Dailymed Organization Organization Sider Dailymed Drugs Offer Drugbank DBpedia Targets Protein Property mapping similar Leads to σ1 . . . σn Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54
  • 20.
    RAVEN Approach Learning Threshold Active perceptronlearning Begin with educated guess, e.g., θi = 0.9 Update thresholds based on most informative examples Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54
  • 21.
    RAVEN Approach Learning Threshold Active perceptronlearning Begin with educated guess, e.g., θi = 0.9 Update thresholds based on most informative examples Guess initial classifier Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54
  • 22.
    RAVEN Approach Learning Threshold Active perceptronlearning Begin with educated guess, e.g., θi = 0.9 Update thresholds based on most informative examples Pick most informative examples, i.e., unclassified and closest to boundary Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 14 / 54
  • 23.
    RAVEN Approach Learning Threshold Active perceptronlearning Begin with educated guess, e.g., θi = 0.9 Update thresholds based on most informative examples Ask for classification from oracle Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 15 / 54
  • 24.
    RAVEN Approach Learning Threshold Active perceptronlearning Begin with educated guess, e.g., θi = 0.9 Update thresholds based on most informative examples Update classifier Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 16 / 54
  • 25.
    RAVEN Evaluation Evaluation on Diseases(Diseasome to DBpedia) Learning rate = 0.02 10 questions/iteration F-measure of up to 92% 1 3 5 7 9 11 13 15 17 19 21 23 25 Number of iterations 0 10 20 30 40 50 60 70 80 90 100 P (%) R (%) F (%) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 17 / 54
  • 26.
    RAVEN Evaluation Learning rate =0.02 10 questions/iteration 1 3 5 7 9 11 13 15 17 19 21 23 25 Number of iterations 10 100 1000 Runtime(ms) Diseases Drugs Side Effects Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 18 / 54
  • 27.
    Table of Contents 1Introduction 2 Raven 3 Eagle 4 Coala 5 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 19 / 54
  • 28.
    Eagle Efficient Active Learningof Link Specifications using Genetic Programming Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 20 / 54
  • 29.
    Eagle Efficient Active Learningof Link Specifications using Genetic Programming Eagle Provides means for automatic class and property matching Minimizes human labeling effort through active learning Allow for learning generic specs (limitation of RAVEN) Similar approaches [NIK+12; ISE+12] Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 21 / 54
  • 30.
    Eagle Formal Definition Same formalsetting as RAVEN Two sets of restrictions resp. that specify the sets S resp. T, a specification of mapping properties (p1, q1), . . . , (pn, qn) for the elements of S and T and a specification of a complex similarity measure σ as the combination of several atomic similarity measures σ1, . . . , σn and of a set of thresholds θ1, . . . , θn such that θi is the threshold for σi . Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 22 / 54
  • 31.
    Eagle LS example Can learngeneric classifier type f (levenshtein(:title, :title), 0.53) f (cosine(:venue, :year), 1.00) f (jaccard(:title, :authors), 0.43) f (trigrams(:title, :year), 1.00) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 23 / 54
  • 32.
    Eagle Idea & Goal Eagle Idea:Specifications are trees Goal: Learn elements of trees through genetic operations until best LS is found (m4, θ4) (m2, θ2) p3 q3 p2 q2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 24 / 54
  • 33.
    Eagle Algorithm Step 1:Generate initial population Random process (property pairs, thresholds) Compute fitness Fitness = F-Measure w.r.t known data (m1, θ1) p1 q1 (m2, θ2) p2 q2 (m3, θ3) p3 q3 (m4, θ4) (m5, θ5) p3 q3 p2 q2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 25 / 54
  • 34.
    Eagle Algorithm Step 2:Evolve population Tournament between two individuals Two operators: Mutation and crossover (m1, θ1) p1 q1 (m3, θ3) p2 q2 (m2, θ2) p3 q3 (m4, θ4) (m5, θ5) p3 q3 p2 q2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
  • 35.
    Eagle Algorithm Step 2:Evolve population Tournament between two individuals Two operators: Mutation and crossover (m1, θ1) p1 q1 (m3, θ3) p2 q2 (m2, θ2) p3 q3 (m4, θ4) (m5, θ5) p3 q3 p2 q2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
  • 36.
    Eagle Algorithm Step 2:Evolve population Tournament between two individuals Two operators: Mutation and crossover (m1, θ1) p1 q1 (m3, θ3) p2 q2 (m2, θ2) p3 q3 (m4, θ4) (m5, θ5) p3 q3 p2 q2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
  • 37.
    Eagle Algorithm Step 2:Evolve population Tournament between two individuals Two operators: Mutation and crossover p1 q1 (m1, θ1 + α) (m3, θ3) p2 q2 (m2, θ2) p3 q3 (m4, θ4) (m5, θ5) p3 q3 p2 q2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
  • 38.
    Eagle Algorithm Step 2:Evolve population Tournament between two individuals Two operators: Mutation and crossover p1 q1 (m1, θ1 + α) (m3, θ3) p2 q2 (m2, θ2) p3 q3 (m4, θ4) p3 q3 (m2, θ2) p3 q3 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54
  • 39.
    Eagle Algorithm Step 3:Computation of most informative links Previous approaches define amount of information of link as closeness to the decision boundary Here, use disagreement amongst elements of population of size n δ((s, t)) = (n − |Mt i : (s, t) ∈ Mi )|)(n − |Mt i : (s, t) /∈ Mi |) Function is maximal when n 2 count (s, t) as positive and n 2 as negative Can be modeled with other functions such as entropy Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 27 / 54
  • 40.
    Eagle Algorithm Step 4:Active Learning Compute d((s, t)) for all (s, t) returned by a LS Pick k most informative Require labeling from user Update list of positive and negative examples (m1, θ1 + α) p1 q1 (m2, θ2) p3 q3 (m3, θ3) p2 q2 (m4, θ4) (m2, θ2) p3 q3 p2 q2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 28 / 54
  • 41.
    Eagle Algorithm Step 5:Remove least fit elements Fitness = F-Measure w.r.t known data (m1, θ1 + α) p1 q1 (m2, θ2) p3 q3 (m3, θ3) p2 q2 (m4, θ4) (m2, θ2) p3 q3 p2 q2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54
  • 42.
    Eagle Algorithm Step 5:Remove least fit elements Fitness = F-Measure w.r.t known data (m1, θ1 + α) p1 q1 (m2, θ2) p3 q3 (m3, θ3) p2 q2 (m4, θ4) (m2, θ2) p3 q3 p2 q2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54
  • 43.
    Eagle Algorithm Step 5:Remove least fit elements Fitness = F-Measure w.r.t known data (m1, θ1 + α) p1 q1 (m2, θ2) p3 q3 (m3, θ3) p2 q2 (m4, θ4) (m2, θ2) p3 q3 p2 q2 If termination conditions not met, goto Step 2 Else terminate and pick fittest LS Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54
  • 44.
    Eagle Algorithm Unsupervised learning Measuredegree of monogamy of links [NIK+12] Only works for 1-1 relations, e.g., owl:sameAs P(M) = |{s|∃t : (s, t) ∈ M}| s |{t : (s, t) ∈ M}| , R(M) = |{t|∃s : (s, t) ∈ M}| t |{s : (s, t) ∈ M}| . (1) Fβ (M) = (1 + β2 ) Pd (M)Rd (M) β2Pd (M) + Rd (M) (2) s1 s2 s3 t1 t2 t3 t4 link link link link P = 3/4, R = 2/4, F = 3/5. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 30 / 54
  • 45.
    Eagle Experiments and Results ExperimentalSetup: Compared batch learning and genetic programming Used 3 different data sets 1 Dailymed-Drugbank (LATC) 2 DBpedia-LinkedMDB (LATC) 3 DBLP-ACM Compared different sizes of population (20,100) Compared random annotation with active learning Mutation and crossover rates = 0.6 Maximal number of iterations = 50 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 31 / 54
  • 46.
    Eagle Experiments and Results Dailymed-Drugbank NgongaNgomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 32 / 54
  • 47.
    Eagle Experiments and Results DBpedia-LinkedMDB NgongaNgomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 33 / 54
  • 48.
    Eagle Experiments and Results DBLP-ACM NgongaNgomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 34 / 54
  • 49.
    Eagle Experiments and Results Largerpopulation leads to Better results, yet Longer runtimes For most datasets, population size of 100 seems sufficient for most linked data sets EAGLE is more time-efficient than state of the art 337s for ACM-DBLP (n=100) vs. 1553s for Marlin (ADTree) 2196s for Marlin (SVM) 4320s for Febrl (SVM) Active learning clearly outperforms random annotation Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 35 / 54
  • 50.
    Table of Contents 1Introduction 2 Raven 3 Eagle 4 Coala 5 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 36 / 54
  • 51.
    Coala Correlation-Aware Active Learningof Link Specifications Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 37 / 54
  • 52.
    Coala Learning Complex Specifications Supervised(mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54
  • 53.
    Coala Learning Complex Specifications Supervised(mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54
  • 54.
    Coala Learning Complex Specifications Supervised(mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54
  • 55.
    Coala Learning Complex Specifications Supervised(mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54
  • 56.
    Coala Learning Complex Specifications Supervised(mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54
  • 57.
    Coala Learning Complex Specifications Supervised(mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54
  • 58.
    Coala Learning Complex Specifications Insight Choiceof right example is key for learning So far, only use of informativeness Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54
  • 59.
    Coala Learning Complex Specifications Insight Choiceof right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54
  • 60.
    Coala Learning Complex Specifications Insight Choiceof right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Higher F-measure Often slower Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54
  • 61.
    Coala Approach Basic Idea Usesimilarity of link candidates when selecting most informative examples Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54
  • 62.
    Coala Approach Basic Idea Usesimilarity of link candidates when selecting most informative examples Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54
  • 63.
    Coala Approach Basic Idea Usesimilarity of link candidates when selecting most informative examples Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54
  • 64.
    Coala Similarity of Candidates Linkcandidate x = (s, t) can be regarded as vector (σ1(x), . . . , σn(x)) ∈ [0, 1]n . Similarity of link candidates x and y: sim(x, y) = 1 1 + n i=1 (σi (x) − σi (y))2 . (3) Allows exploiting both intra- and inter-class similarity Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 42 / 54
  • 65.
    Coala Graph Clustering Rationale: Useintra-class similarity Approach Cluster elements of S+ and S− independently Choose one element per cluster as representative Present oracle with most informative representatives 0.8 0.9 0.8 S+ S- 0.8 0.9 0.8 0.25 0.25 0.9 0.8 0.8 0.8 0.25 a b c d e d f g h i k l Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 43 / 54
  • 66.
    Coala BorderFlow G = (V, E, ω) with V = S+ or V = S− ω(x, y) = sim(x, y) Keep best ec edges for each x ∈ V Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 44 / 54
  • 67.
    Coala BorderFlow Seed-based algorithm Goal: Maximizeborderflow ratio bf (X) = Ω(b(X),X) Ω(b(X),n(X)) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54
  • 68.
    Coala BorderFlow Seed-based algorithm Goal: Maximizeborderflow ratio bf (X) = Ω(b(X),X) Ω(b(X),n(X)) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54
  • 69.
    Coala BorderFlow Seed-based algorithm Goal: Maximizeborderflow ratio bf (X) = Ω(b(X),X) Ω(b(X),n(X)) http://sourceforge.net/projects/cugar-framework/ Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54
  • 70.
    Coala BorderFlow Seed-based algorithm Goal: Maximizeborderflow ratio bf (X) = Ω(b(X),X) Ω(b(X),n(X)) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54
  • 71.
    Coala BorderFlow Seed-based algorithm Goal: Maximizeborderflow ratio bf (X) = Ω(b(X),X) Ω(b(X),n(X)) http://sourceforge.net/projects/cugar-framework/ Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54
  • 72.
    Coala Spreading Activation Rationale: Useboth inter- and intra-class similarity Approach M0 : mij = sim(xi , xj ) with (xi , xj ) ∈ (S+ ∪ S− )2 A0 : ai = ifm(xi ) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54
  • 73.
    Coala Spreading Activation Rationale: Useboth inter- and intra-class similarity Approach M0 : mij = sim(xi , xj ) with (xi , xj ) ∈ (S+ ∪ S− )2 A0 : ai = ifm(xi ) At = At−1 + Mt−1At−1 (spread activation) At = At / max(At ) (normalize) Mt = M r t−1 (weight decay) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54
  • 74.
    Coala Spreading Activation Rationale: Useboth inter- and intra-class similarity Approach M0 : mij = sim(xi , xj ) with (xi , xj ) ∈ (S+ ∪ S− )2 A0 : ai = ifm(xi ) At = At−1 + Mt−1At−1 (spread activation) At = At / max(At ) (normalize) Mt = M r t−1 (weight decay) 3 iterations 3.9*10-3 0.97 0.691 0.73 1.5*10-5 3.9*10-3 1.5*10-5 3.9*10-3 S+ S- 0.8 0.80.9 0.9 0.25 0.5 0.5 0.25 0.5 S+ S- Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54
  • 75.
    Coala Evaluation Experimental Setup UsedEAGLE as active learning approach Mutation and crossover rate = 0.6 Selection rate = 0.7 Not deterministic ⇒ Ran each experiment 5 times 5 queries to oracle per iteration 10 iterations overall 2 populations sizes: 20 and 100 50 generations between iterations Two real-world and three synthetic datasets Single thread of a server (JDK1.7, Ubuntu 10.0.4, AMD Opteron 2GHz, 2GB/Experiment) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 48 / 54
  • 76.
    Coala Evaluation Parameters forWD Ran experiments on DBLP-ACM Population = 20 r ∈ {2, 4, 8, 16, 32} 10 20 30 40 50 60 70 80 90 100 0.5 0.6 0.7 0.8 0.9 1 F-score 0 200 400 600 800 1,000 runtimeinseconds f(2) f(4) f(8) f(16) f(32) d(2) d(4) d(8) d(16) d(32) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 49 / 54
  • 77.
    Coala Evaluation Parameters forCL Ran experiments on DBLP-ACM Population = 20 ec ∈ {1, 2, 3, 4, 5} 10 20 30 40 50 60 70 80 90 100 0.5 0.6 0.7 0.8 0.9 1 F-score 0 500 1,000 1,500 2,000 runtimeinseconds f(1) f(2) f(3) f(4) f(5) d(1) d(2) d(3) d(4) d(5) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 50 / 54
  • 78.
    Coala Evaluation F-Scores Population =100, final values Better results, yet unclear when to use WD or CL DataSet EAGLE WD CL Abt 0.19±0.04 0.25±0.04 0.23±0.04 DBLP 0.91±0.03 0.96±0.01 0.96±0.02 Person1 0.86±0.02 0.89±0.01 0.81±0.18 Person2 0.74±0.03 0.71±0.08 0.77±0.03 Restaurant 0.89±0.0 0.86±0.02 0.89±0.0 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 51 / 54
  • 79.
    Table of Contents 1Introduction 2 Raven 3 Eagle 4 Coala 5 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 52 / 54
  • 80.
    Summary and Conclusion Largenumber of challenges to learning accurate specifications 1 Reduce labeling effort ⇒ Active learning 2 Learn complex specifications ⇒ Genetic programming 3 Learn specifications efficienty ⇒ See previous slides Challenges include 1 Determinism 2 Deep learning 3 Self-checking 4 . . . Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 53 / 54
  • 81.
    Acknowledgment This work wassupported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227). Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 54 / 54