SlideShare a Scribd company logo
COLIBRI
Unsupervised Link Discovery Through Knowledge Base
Repair
Axel-Cyrille Ngonga Ngomo Mohamed Ahmed Sherif Klaus Lyko
ESWC 2014, Crete, Greece
Outline Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Ngonga Ngomo· Sherif · Lyko Colibri 2/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Why Link Discovery?
1 Fourth principle
2 Links are central for
• Cross-ontology QA
• Data Integration
• Reasoning
• Federated Queries
• ...
Ngonga Ngomo· Sherif · Lyko Colibri 2/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Why is it difficult?
• Time complexity
• Large number of triples
• Quadratic runtime
• Complexity of specifications
• Combination of several attributes required for high precision
• Tedious discovery of most adequate mapping
• Dataset-dependent similarity functions
Ngonga Ngomo· Sherif · Lyko Colibri 3/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Solution
1 Use unsupervised link discovery
• No need for training data
• Minimizes load on user
2 Combine results of linking tasks
over n > 2 knowledge bases
• Make explicit use of the topology
of the Data Web
3 Repair noisy data to improve link
discovery
• Address different quality of
datasets across the Data Web
Ngonga Ngomo· Sherif · Lyko Colibri 4/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Ngonga Ngomo· Sherif · Lyko Colibri 5/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2
. . . Kn
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2
. . . Kn
Unsupervised LD
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2
. . . Kn
Unsupervised LD
Voting
Mappings
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2
. . . Kn
Unsupervised LD
Voting
Mappings
Repair
Voting matrices
Outline Motivation Approach Evaluation Conclusion and Future Work
Colibri overview
K1 K2
. . . Kn
Unsupervised LD
Voting
Mappings
Repair
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 5/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Key Concepts
• Mapping matrix
• M12 =


1 0 0
0 1 0
0 0 0


ex2:1
ex2:2
ex2:3
ex1:1
ex1:2
ex1:3
1
1
K2K1
Ngonga Ngomo· Sherif · Lyko Colibri 6/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Key Concepts
• Pseudo-F-measure as objective function
• P(Mij ) =
|links(Ki ,Mij )|+|links(Kj ,Mij )|
2|Mij |
• R(Mij ) =
|links(Ki ,Mij )|+|links(Kj ,Mij )|
|Ki |+|Kj |
• Fβ = (1 + β2) PR
β2P+R
Example:
• P(M12) = 1
• R(M12) = 2
3
• F1(M12) = 4
5
ex2:1
ex2:2
ex2:3
ex1:1
ex1:2
ex1:3
1
1
K2K1
Ngonga Ngomo· Sherif · Lyko Colibri 7/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 1: Unsupervised Link Discovery
K1 K2
. . . Kn
Unsupervised LD
VotingRepair
Mappings
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 8/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 1: Unsupervised Link Discovery
• Link all pairs (Ki , Kj ) using any unsupervised link discovery
approach
• Here, Euclid
• Specifications are points in a similarity space
• Find accurate specification by using hierarchical grid search
• Detect specification which maximizes Fβ
Ngonga Ngomo· Sherif · Lyko Colibri 9/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 1: Unsupervised Link Discovery
• Mapping matrices
• M12 =


1 0 0
0 1 0
0 0 0


• M13 =


1 0 1
0 0.5 0
0 0 0.5


• M23 =


1 0 0
0 0.5 0
0 0 0.5


ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
1
1
0.5
0.5
1
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 10/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
K1 K2
. . . Kn
Unsupervised LD
VotingRepair
Mappings
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 11/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
Ki s
Kjt
1
Kk z
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
Ki s
Kjt
1
Kk z
1
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
Ki s
Kjt
1
Kk z
1
1
Ngonga Ngomo· Sherif · Lyko Colibri 12/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Vij = 1
n−1


Mij +
n
k=1
k=i,j
MikMkj



Ngonga Ngomo· Sherif · Lyko Colibri 13/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Vij = 1
n−1


Mij +
n
k=1
k=i,j
MikMkj



• Mapping matrices
• M12 =


1 0 0
0 1 0
0 0 0


• M13 =


1 0 1
0 0.5 0
0 0 0.5


• M23 =


1 0 0
0 0.5 0
0 0 0.5


ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
1
1
0.5
0.5
1
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 13/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Vij = 1
n−1


Mij +
n
k=1
k=i,j
MikMkj



• Mapping matrices
• M12 =


1 0 0
0 1 0
0 0 0


• M13 =


1 0 1
0 0.5 0
0 0 0.5


• M23 =


1 0 0
0 0.5 0
0 0 0.5


ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
1
1
0.5
0.5
1
0.5
0.5
K3
K1
K2
V12 =


1 0 0.25
0 0.625 0
0 0 0.125


Ngonga Ngomo· Sherif · Lyko Colibri 13/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Voting matrices
• V12 =


1 0 0.25
0 0.625 0
0 0 0.125


• V13 =


1 0 0.5
0 0.5 0
0 0 0.25


• V23 =


1 0 0
0 0.5 0
0 0 0.25


• Post-processed matrices
• ˜V12 =


1 0 0
0 0.625 0
0 0 0.125


ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
1
1
0.5
0.5
1
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 14/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Assume links in ˜Vij to be correct
• ˜vij = 1 → All matrices agree on
how to link (Ki , Kj )
e.g., ˜V12(ex1:1, ex2:1)
• For all ˜vij < 1 assume either
1 Missing links
e.g., ˜V12(ex1:3, ex2:3) not
contained in M12
2 Weak links
e.g., ˜V12(ex1:2, ex2:2) < 1 is
due to M13(ex1:2, ex3:2) and
M32(ex3:2, ex2:2) being 0.5
˜V12 =


1 0 0
0 0.625 0
0 0 0.125


ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
1
1
0.5
0.5
1
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 15/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Assume links in ˜Vij to be correct
• ˜vij = 1 → All matrices agree on
how to link (Ki , Kj )
e.g., ˜V12(ex1:1, ex2:1)
• For all ˜vij < 1 assume either
1 Missing links
e.g., ˜V12(ex1:3, ex2:3) not
contained in M12
2 Weak links
e.g., ˜V12(ex1:2, ex2:2) < 1 is
due to M13(ex1:2, ex3:2) and
M32(ex3:2, ex2:2) being 0.5
˜V12 =


1 0 0
0 0.625 0
0 0 0.125


ex1:1ex1:1 ex1:2 ex1:3
ex2:1ex2:1
ex2:2
ex2:3
ex3:1ex3:1
ex3:2
ex3:3
11
11
1
1
0.5
0.5
11
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 15/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Assume links in ˜Vij to be correct
• ˜vij = 1 → All matrices agree on
how to link (Ki , Kj )
e.g., ˜V12(ex1:1, ex2:1)
• For all ˜vij < 1 assume either
1 Missing links
e.g., ˜V12(ex1:3, ex2:3) not
contained in M12
2 Weak links
e.g., ˜V12(ex1:2, ex2:2) < 1 is
due to M13(ex1:2, ex3:2) and
M32(ex3:2, ex2:2) being 0.5
˜V12 =


1 0 0
0 0.625 0
0 0 0.125


ex1:1 ex1:2 ex1:3ex1:3
ex2:1
ex2:2
ex2:3ex2:3
ex3:1
ex3:2
ex3:3
1
1
1
1
0.5
0.5
1
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 15/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 2: Voting
• Assume links in ˜Vij to be correct
• ˜vij = 1 → All matrices agree on
how to link (Ki , Kj )
e.g., ˜V12(ex1:1, ex2:1)
• For all ˜vij < 1 assume either
1 Missing links
e.g., ˜V12(ex1:3, ex2:3) not
contained in M12
2 Weak links
e.g., ˜V12(ex1:2, ex2:2) < 1 is
due to M13(ex1:2, ex3:2) and
M32(ex3:2, ex2:2) being 0.5
˜V12 =


1 0 0
0 0.625 0
0 0 0.125


ex1:1 ex1:2ex1:2 ex1:3
ex2:1
ex2:2ex2:2
ex2:3
ex3:1
ex3:2ex3:2
ex3:3
1
1
1
1
0.50.5
0.5
1
0.50.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 15/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
K1 K2
. . . Kn
Unsupervised LD
VotingRepair
Mappings
Voting matrices
Repair instances
Ngonga Ngomo· Sherif · Lyko Colibri 16/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
• Goal: Repair instance data so
as to improve ˜vij < 1
• Link to be repaired is
(ex1:2, ex2:2).
• Reason for this link:
• rs = ex1:2 and
• rt = ex3:2.
• Computing average similarity:
• ¯σ(ex1:2) = 0.75 while
• ¯σ(ex3:2) = 0.5.
• Colibri overwrite the values
of ex3:2 with those of ex1:2.
˜V12 =


1 0 0
0 0.625 0
0 0 0.125


ex1:1 ex1:2 ex1:3
ex2:1
ex2:2
ex2:3
ex3:1
ex3:2
ex3:3
1
1
1
1
0.5
0.5
1
0.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 17/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
• Goal: Repair instance data so
as to improve ˜vij < 1
• Link to be repaired is
(ex1:2, ex2:2).
• Reason for this link:
• rs = ex1:2 and
• rt = ex3:2.
• Computing average similarity:
• ¯σ(ex1:2) = 0.75 while
• ¯σ(ex3:2) = 0.5.
• Colibri overwrite the values
of ex3:2 with those of ex1:2.
˜V12 =


1 0 0
0 0.625 0
0 0 0.125


ex1:1 ex1:2ex1:2 ex1:3
ex2:1
ex2:2ex2:2
ex2:3
ex3:1
ex3:2ex3:2
ex3:3
1
1
1
1
0.50.5
0.5
1
0.50.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 17/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
• Goal: Repair instance data so
as to improve ˜vij < 1
• Link to be repaired is
(ex1:2, ex2:2).
• Reason for this link:
• rs = ex1:2 and
• rt = ex3:2.
• Computing average similarity:
• ¯σ(ex1:2) = 0.75 while
• ¯σ(ex3:2) = 0.5.
• Colibri overwrite the values
of ex3:2 with those of ex1:2.
˜V12 =


1 0 0
0 0.625 0
0 0 0.125


ex1:1 ex1:2ex1:2 ex1:3
ex2:1
ex2:2ex2:2
ex2:3
ex3:1
ex3:2ex3:2
ex3:3
1
1
1
11
0.50.5
0.5
1
0.50.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 17/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Step 3: Repair
• Goal: Repair instance data so
as to improve ˜vij < 1
• Link to be repaired is
(ex1:2, ex2:2).
• Reason for this link:
• rs = ex1:2 and
• rt = ex3:2.
• Computing average similarity:
• ¯σ(ex1:2) = 0.75 while
• ¯σ(ex3:2) = 0.5.
• Colibri overwrite the values
of ex3:2 with those of ex1:2.
˜V12 =


1 0 0
0 0.625 0
0 0 0.125


ex1:1 ex1:2ex1:2 ex1:3
ex2:1
ex2:2ex2:2
ex2:3
ex3:1
ex3:2ex3:2
ex3:3
1
1
1
1
0.50.5
0.5
1
0.50.5
0.5
K3
K1
K2
Ngonga Ngomo· Sherif · Lyko Colibri 17/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Ngonga Ngomo· Sherif · Lyko Colibri 18/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Benchmark Generation Approach
• So far, no benchmark for linking n > 2 knowledge bases
• Benchmark generation approach (Ferrara et al., 2011)
• Generated m − 1 copies of initial dataset K1
• Alteration operators:
• Misspellings
• Abbreviations
• Word permutations
• Alteration strategy:
• Pick random resource according to alteration probability
• Pick random operator
Ngonga Ngomo· Sherif · Lyko Colibri 18/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
• Datasets:
• Two synthetic datasets (OAEI2010)
• Three real-world datasets (Koepcke et al., 2010)
• Colibri:
• Maximal number of iterations = 10
• Number of knowledge bases = {3, 4, 5}
• Alteration probability ap = {10%, 20%, . . . , 50%}
• Repeat each experiment 5 times
Ngonga Ngomo· Sherif · Lyko Colibri 19/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Experimental Results (synthetic dataset)
KBs FEuclid FColibri Runtime (sec) Repaired links
3 0.89 0.98 0.4 43
4 0.90 1.00 0.9 35
5 0.88 1.00 1.3 34
• Restaurant dataset
• Average values after 10 iterations
• Alteration probability ap = 50%
Ngonga Ngomo· Sherif · Lyko Colibri 20/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Experimental Results (real-world dataset)
KBs FEuclid FColibri Runtime (sec) Repaired links
3 0.86 0.98 81.8 300
4 0.85 0.99 160.4 150
5 0.84 0.88 246.8 60
• Amazon dataset
• Average values after 10 iterations
• Alteration probability ap = 50%
Ngonga Ngomo· Sherif · Lyko Colibri 21/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Results on the Restaurants dataset
• Alteration probability
ap = 50%
• Knowledge bases = 5
iterationNr 2 4 6
0
20
40
60
80
100
Precision
Recall
F-Measure
Error rate
Full results at:
https://github.com/AKSW/LIMES/tree/master/
evaluationsResults/colibri
Ngonga Ngomo· Sherif · Lyko Colibri 22/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Ngonga Ngomo· Sherif · Lyko Colibri 23/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
• Conclusion
• Presented Colibri
• Improved F-measure of Euclid up to 14%
• Future Work
• Evaluation on other datasets
• Interactive scenarios (i.e., consult user before dataset repair)
• Combination with other unsupervised solutions (e.g., Eagle)
Ngonga Ngomo· Sherif · Lyko Colibri 23/24
Outline Motivation Approach Evaluation Conclusion and Future Work
Thank You!
Questions?
Mohamed Sherif
Augustusplatz 10
D-04109 Leipzig
sherif@informatik.uni-leipzig.de
http://aksw.org/MohamedSherif
http://limes.sf.net
Ngonga Ngomo· Sherif · Lyko Colibri 24/24

More Related Content

Recently uploaded

JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
ShivajiThube2
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 

Recently uploaded (20)

JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

  • 1. COLIBRI Unsupervised Link Discovery Through Knowledge Base Repair Axel-Cyrille Ngonga Ngomo Mohamed Ahmed Sherif Klaus Lyko ESWC 2014, Crete, Greece
  • 2. Outline Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Ngonga Ngomo· Sherif · Lyko Colibri 2/24
  • 3. Outline Motivation Approach Evaluation Conclusion and Future Work Why Link Discovery? 1 Fourth principle 2 Links are central for • Cross-ontology QA • Data Integration • Reasoning • Federated Queries • ... Ngonga Ngomo· Sherif · Lyko Colibri 2/24
  • 4. Outline Motivation Approach Evaluation Conclusion and Future Work Why is it difficult? • Time complexity • Large number of triples • Quadratic runtime • Complexity of specifications • Combination of several attributes required for high precision • Tedious discovery of most adequate mapping • Dataset-dependent similarity functions Ngonga Ngomo· Sherif · Lyko Colibri 3/24
  • 5. Outline Motivation Approach Evaluation Conclusion and Future Work Solution 1 Use unsupervised link discovery • No need for training data • Minimizes load on user 2 Combine results of linking tasks over n > 2 knowledge bases • Make explicit use of the topology of the Data Web 3 Repair noisy data to improve link discovery • Address different quality of datasets across the Data Web Ngonga Ngomo· Sherif · Lyko Colibri 4/24
  • 6. Outline Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Ngonga Ngomo· Sherif · Lyko Colibri 5/24
  • 7. Outline Motivation Approach Evaluation Conclusion and Future Work Colibri overview
  • 8. Outline Motivation Approach Evaluation Conclusion and Future Work Colibri overview K1 K2 . . . Kn
  • 9. Outline Motivation Approach Evaluation Conclusion and Future Work Colibri overview K1 K2 . . . Kn Unsupervised LD
  • 10. Outline Motivation Approach Evaluation Conclusion and Future Work Colibri overview K1 K2 . . . Kn Unsupervised LD Voting Mappings
  • 11. Outline Motivation Approach Evaluation Conclusion and Future Work Colibri overview K1 K2 . . . Kn Unsupervised LD Voting Mappings Repair Voting matrices
  • 12. Outline Motivation Approach Evaluation Conclusion and Future Work Colibri overview K1 K2 . . . Kn Unsupervised LD Voting Mappings Repair Voting matrices Repair instances Ngonga Ngomo· Sherif · Lyko Colibri 5/24
  • 13. Outline Motivation Approach Evaluation Conclusion and Future Work Key Concepts • Mapping matrix • M12 =   1 0 0 0 1 0 0 0 0   ex2:1 ex2:2 ex2:3 ex1:1 ex1:2 ex1:3 1 1 K2K1 Ngonga Ngomo· Sherif · Lyko Colibri 6/24
  • 14. Outline Motivation Approach Evaluation Conclusion and Future Work Key Concepts • Pseudo-F-measure as objective function • P(Mij ) = |links(Ki ,Mij )|+|links(Kj ,Mij )| 2|Mij | • R(Mij ) = |links(Ki ,Mij )|+|links(Kj ,Mij )| |Ki |+|Kj | • Fβ = (1 + β2) PR β2P+R Example: • P(M12) = 1 • R(M12) = 2 3 • F1(M12) = 4 5 ex2:1 ex2:2 ex2:3 ex1:1 ex1:2 ex1:3 1 1 K2K1 Ngonga Ngomo· Sherif · Lyko Colibri 7/24
  • 15. Outline Motivation Approach Evaluation Conclusion and Future Work Step 1: Unsupervised Link Discovery K1 K2 . . . Kn Unsupervised LD VotingRepair Mappings Voting matrices Repair instances Ngonga Ngomo· Sherif · Lyko Colibri 8/24
  • 16. Outline Motivation Approach Evaluation Conclusion and Future Work Step 1: Unsupervised Link Discovery • Link all pairs (Ki , Kj ) using any unsupervised link discovery approach • Here, Euclid • Specifications are points in a similarity space • Find accurate specification by using hierarchical grid search • Detect specification which maximizes Fβ Ngonga Ngomo· Sherif · Lyko Colibri 9/24
  • 17. Outline Motivation Approach Evaluation Conclusion and Future Work Step 1: Unsupervised Link Discovery • Mapping matrices • M12 =   1 0 0 0 1 0 0 0 0   • M13 =   1 0 1 0 0.5 0 0 0 0.5   • M23 =   1 0 0 0 0.5 0 0 0 0.5   ex1:1 ex1:2 ex1:3 ex2:1 ex2:2 ex2:3 ex3:1 ex3:2 ex3:3 1 1 1 1 0.5 0.5 1 0.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 10/24
  • 18. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting K1 K2 . . . Kn Unsupervised LD VotingRepair Mappings Voting matrices Repair instances Ngonga Ngomo· Sherif · Lyko Colibri 11/24
  • 19. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting Ki s Kjt 1 Kk z
  • 20. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting Ki s Kjt 1 Kk z 1
  • 21. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting Ki s Kjt 1 Kk z 1 1 Ngonga Ngomo· Sherif · Lyko Colibri 12/24
  • 22. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting • Vij = 1 n−1   Mij + n k=1 k=i,j MikMkj    Ngonga Ngomo· Sherif · Lyko Colibri 13/24
  • 23. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting • Vij = 1 n−1   Mij + n k=1 k=i,j MikMkj    • Mapping matrices • M12 =   1 0 0 0 1 0 0 0 0   • M13 =   1 0 1 0 0.5 0 0 0 0.5   • M23 =   1 0 0 0 0.5 0 0 0 0.5   ex1:1 ex1:2 ex1:3 ex2:1 ex2:2 ex2:3 ex3:1 ex3:2 ex3:3 1 1 1 1 0.5 0.5 1 0.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 13/24
  • 24. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting • Vij = 1 n−1   Mij + n k=1 k=i,j MikMkj    • Mapping matrices • M12 =   1 0 0 0 1 0 0 0 0   • M13 =   1 0 1 0 0.5 0 0 0 0.5   • M23 =   1 0 0 0 0.5 0 0 0 0.5   ex1:1 ex1:2 ex1:3 ex2:1 ex2:2 ex2:3 ex3:1 ex3:2 ex3:3 1 1 1 1 0.5 0.5 1 0.5 0.5 K3 K1 K2 V12 =   1 0 0.25 0 0.625 0 0 0 0.125   Ngonga Ngomo· Sherif · Lyko Colibri 13/24
  • 25. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting • Voting matrices • V12 =   1 0 0.25 0 0.625 0 0 0 0.125   • V13 =   1 0 0.5 0 0.5 0 0 0 0.25   • V23 =   1 0 0 0 0.5 0 0 0 0.25   • Post-processed matrices • ˜V12 =   1 0 0 0 0.625 0 0 0 0.125   ex1:1 ex1:2 ex1:3 ex2:1 ex2:2 ex2:3 ex3:1 ex3:2 ex3:3 1 1 1 1 0.5 0.5 1 0.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 14/24
  • 26. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting • Assume links in ˜Vij to be correct • ˜vij = 1 → All matrices agree on how to link (Ki , Kj ) e.g., ˜V12(ex1:1, ex2:1) • For all ˜vij < 1 assume either 1 Missing links e.g., ˜V12(ex1:3, ex2:3) not contained in M12 2 Weak links e.g., ˜V12(ex1:2, ex2:2) < 1 is due to M13(ex1:2, ex3:2) and M32(ex3:2, ex2:2) being 0.5 ˜V12 =   1 0 0 0 0.625 0 0 0 0.125   ex1:1 ex1:2 ex1:3 ex2:1 ex2:2 ex2:3 ex3:1 ex3:2 ex3:3 1 1 1 1 0.5 0.5 1 0.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 15/24
  • 27. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting • Assume links in ˜Vij to be correct • ˜vij = 1 → All matrices agree on how to link (Ki , Kj ) e.g., ˜V12(ex1:1, ex2:1) • For all ˜vij < 1 assume either 1 Missing links e.g., ˜V12(ex1:3, ex2:3) not contained in M12 2 Weak links e.g., ˜V12(ex1:2, ex2:2) < 1 is due to M13(ex1:2, ex3:2) and M32(ex3:2, ex2:2) being 0.5 ˜V12 =   1 0 0 0 0.625 0 0 0 0.125   ex1:1ex1:1 ex1:2 ex1:3 ex2:1ex2:1 ex2:2 ex2:3 ex3:1ex3:1 ex3:2 ex3:3 11 11 1 1 0.5 0.5 11 0.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 15/24
  • 28. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting • Assume links in ˜Vij to be correct • ˜vij = 1 → All matrices agree on how to link (Ki , Kj ) e.g., ˜V12(ex1:1, ex2:1) • For all ˜vij < 1 assume either 1 Missing links e.g., ˜V12(ex1:3, ex2:3) not contained in M12 2 Weak links e.g., ˜V12(ex1:2, ex2:2) < 1 is due to M13(ex1:2, ex3:2) and M32(ex3:2, ex2:2) being 0.5 ˜V12 =   1 0 0 0 0.625 0 0 0 0.125   ex1:1 ex1:2 ex1:3ex1:3 ex2:1 ex2:2 ex2:3ex2:3 ex3:1 ex3:2 ex3:3 1 1 1 1 0.5 0.5 1 0.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 15/24
  • 29. Outline Motivation Approach Evaluation Conclusion and Future Work Step 2: Voting • Assume links in ˜Vij to be correct • ˜vij = 1 → All matrices agree on how to link (Ki , Kj ) e.g., ˜V12(ex1:1, ex2:1) • For all ˜vij < 1 assume either 1 Missing links e.g., ˜V12(ex1:3, ex2:3) not contained in M12 2 Weak links e.g., ˜V12(ex1:2, ex2:2) < 1 is due to M13(ex1:2, ex3:2) and M32(ex3:2, ex2:2) being 0.5 ˜V12 =   1 0 0 0 0.625 0 0 0 0.125   ex1:1 ex1:2ex1:2 ex1:3 ex2:1 ex2:2ex2:2 ex2:3 ex3:1 ex3:2ex3:2 ex3:3 1 1 1 1 0.50.5 0.5 1 0.50.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 15/24
  • 30. Outline Motivation Approach Evaluation Conclusion and Future Work Step 3: Repair K1 K2 . . . Kn Unsupervised LD VotingRepair Mappings Voting matrices Repair instances Ngonga Ngomo· Sherif · Lyko Colibri 16/24
  • 31. Outline Motivation Approach Evaluation Conclusion and Future Work Step 3: Repair • Goal: Repair instance data so as to improve ˜vij < 1 • Link to be repaired is (ex1:2, ex2:2). • Reason for this link: • rs = ex1:2 and • rt = ex3:2. • Computing average similarity: • ¯σ(ex1:2) = 0.75 while • ¯σ(ex3:2) = 0.5. • Colibri overwrite the values of ex3:2 with those of ex1:2. ˜V12 =   1 0 0 0 0.625 0 0 0 0.125   ex1:1 ex1:2 ex1:3 ex2:1 ex2:2 ex2:3 ex3:1 ex3:2 ex3:3 1 1 1 1 0.5 0.5 1 0.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 17/24
  • 32. Outline Motivation Approach Evaluation Conclusion and Future Work Step 3: Repair • Goal: Repair instance data so as to improve ˜vij < 1 • Link to be repaired is (ex1:2, ex2:2). • Reason for this link: • rs = ex1:2 and • rt = ex3:2. • Computing average similarity: • ¯σ(ex1:2) = 0.75 while • ¯σ(ex3:2) = 0.5. • Colibri overwrite the values of ex3:2 with those of ex1:2. ˜V12 =   1 0 0 0 0.625 0 0 0 0.125   ex1:1 ex1:2ex1:2 ex1:3 ex2:1 ex2:2ex2:2 ex2:3 ex3:1 ex3:2ex3:2 ex3:3 1 1 1 1 0.50.5 0.5 1 0.50.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 17/24
  • 33. Outline Motivation Approach Evaluation Conclusion and Future Work Step 3: Repair • Goal: Repair instance data so as to improve ˜vij < 1 • Link to be repaired is (ex1:2, ex2:2). • Reason for this link: • rs = ex1:2 and • rt = ex3:2. • Computing average similarity: • ¯σ(ex1:2) = 0.75 while • ¯σ(ex3:2) = 0.5. • Colibri overwrite the values of ex3:2 with those of ex1:2. ˜V12 =   1 0 0 0 0.625 0 0 0 0.125   ex1:1 ex1:2ex1:2 ex1:3 ex2:1 ex2:2ex2:2 ex2:3 ex3:1 ex3:2ex3:2 ex3:3 1 1 1 11 0.50.5 0.5 1 0.50.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 17/24
  • 34. Outline Motivation Approach Evaluation Conclusion and Future Work Step 3: Repair • Goal: Repair instance data so as to improve ˜vij < 1 • Link to be repaired is (ex1:2, ex2:2). • Reason for this link: • rs = ex1:2 and • rt = ex3:2. • Computing average similarity: • ¯σ(ex1:2) = 0.75 while • ¯σ(ex3:2) = 0.5. • Colibri overwrite the values of ex3:2 with those of ex1:2. ˜V12 =   1 0 0 0 0.625 0 0 0 0.125   ex1:1 ex1:2ex1:2 ex1:3 ex2:1 ex2:2ex2:2 ex2:3 ex3:1 ex3:2ex3:2 ex3:3 1 1 1 1 0.50.5 0.5 1 0.50.5 0.5 K3 K1 K2 Ngonga Ngomo· Sherif · Lyko Colibri 17/24
  • 35. Outline Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Ngonga Ngomo· Sherif · Lyko Colibri 18/24
  • 36. Outline Motivation Approach Evaluation Conclusion and Future Work Benchmark Generation Approach • So far, no benchmark for linking n > 2 knowledge bases • Benchmark generation approach (Ferrara et al., 2011) • Generated m − 1 copies of initial dataset K1 • Alteration operators: • Misspellings • Abbreviations • Word permutations • Alteration strategy: • Pick random resource according to alteration probability • Pick random operator Ngonga Ngomo· Sherif · Lyko Colibri 18/24
  • 37. Outline Motivation Approach Evaluation Conclusion and Future Work Experimental Setup • Datasets: • Two synthetic datasets (OAEI2010) • Three real-world datasets (Koepcke et al., 2010) • Colibri: • Maximal number of iterations = 10 • Number of knowledge bases = {3, 4, 5} • Alteration probability ap = {10%, 20%, . . . , 50%} • Repeat each experiment 5 times Ngonga Ngomo· Sherif · Lyko Colibri 19/24
  • 38. Outline Motivation Approach Evaluation Conclusion and Future Work Experimental Results (synthetic dataset) KBs FEuclid FColibri Runtime (sec) Repaired links 3 0.89 0.98 0.4 43 4 0.90 1.00 0.9 35 5 0.88 1.00 1.3 34 • Restaurant dataset • Average values after 10 iterations • Alteration probability ap = 50% Ngonga Ngomo· Sherif · Lyko Colibri 20/24
  • 39. Outline Motivation Approach Evaluation Conclusion and Future Work Experimental Results (real-world dataset) KBs FEuclid FColibri Runtime (sec) Repaired links 3 0.86 0.98 81.8 300 4 0.85 0.99 160.4 150 5 0.84 0.88 246.8 60 • Amazon dataset • Average values after 10 iterations • Alteration probability ap = 50% Ngonga Ngomo· Sherif · Lyko Colibri 21/24
  • 40. Outline Motivation Approach Evaluation Conclusion and Future Work Results on the Restaurants dataset • Alteration probability ap = 50% • Knowledge bases = 5 iterationNr 2 4 6 0 20 40 60 80 100 Precision Recall F-Measure Error rate Full results at: https://github.com/AKSW/LIMES/tree/master/ evaluationsResults/colibri Ngonga Ngomo· Sherif · Lyko Colibri 22/24
  • 41. Outline Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Ngonga Ngomo· Sherif · Lyko Colibri 23/24
  • 42. Outline Motivation Approach Evaluation Conclusion and Future Work Conclusion and Future Work • Conclusion • Presented Colibri • Improved F-measure of Euclid up to 14% • Future Work • Evaluation on other datasets • Interactive scenarios (i.e., consult user before dataset repair) • Combination with other unsupervised solutions (e.g., Eagle) Ngonga Ngomo· Sherif · Lyko Colibri 23/24
  • 43. Outline Motivation Approach Evaluation Conclusion and Future Work Thank You! Questions? Mohamed Sherif Augustusplatz 10 D-04109 Leipzig sherif@informatik.uni-leipzig.de http://aksw.org/MohamedSherif http://limes.sf.net Ngonga Ngomo· Sherif · Lyko Colibri 24/24