SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
DEER - Automating RDF Dataset Transformation and Enrichment
With the adoption of RDF across several domains come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means to enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against 8 manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.
With the adoption of RDF across several domains come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means to enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against 8 manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.
DEER - Automating RDF Dataset Transformation and Enrichment
1.
Motivation Approach Evaluation Conclusion and Future Work
DEER
Automating RDF Dataset Transformation and Enrichment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and
Jens Lehmann
June 3, 2015
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 1/26
2.
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 2/26
3.
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 3/26
4.
Motivation Approach Evaluation Conclusion and Future Work
Why RDF Transformation & Enrichment?
Dataset DrugBank
Goal Gather information about companies related to drugs for
a market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26
5.
Motivation Approach Evaluation Conclusion and Future Work
Why RDF Transformation & Enrichment?
Dataset DrugBank
Goal Gather information about companies related to drugs for
a market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26
6.
Motivation Approach Evaluation Conclusion and Future Work
RDF Transformation & Enrichment
Need for enriched datasets
Tourism
Question Answering
Enhanced Reality
...
RDF transformation and enrichment
Triples to be added to the original
KB and/or
Triples to be deleted from the
original KB
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 5/26
7.
Motivation Approach Evaluation Conclusion and Future Work
Manual Knowledge Base Enrichment
Demands for the specification of data enrichment
pipelines
Describe how data is to be integrated (usually manually)
Manual customized enrichment pipelines
⊕ Leads to the expected results
Time consuming
Cannot be ported easily to other datasets
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26
8.
Motivation Approach Evaluation Conclusion and Future Work
Manual Knowledge Base Enrichment
Demands for the specification of data enrichment
pipelines
Describe how data is to be integrated (usually manually)
Manual customized enrichment pipelines
⊕ Leads to the expected results
Time consuming
Cannot be ported easily to other datasets
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26
9.
Motivation Approach Evaluation Conclusion and Future Work
Automatic Knowledge Base Enrichment
Enrichment pipeline M : K → K that maps KB K to an
enriched KB K with K = M(K).
M is an ordered list of atomic enrichment functions
m ∈ M
M =
φ if K = K ,
(m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise.
Research questions
1 How to create self-configuring atomic enrichment
functions m ∈ M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26
10.
Motivation Approach Evaluation Conclusion and Future Work
Automatic Knowledge Base Enrichment
Enrichment pipeline M : K → K that maps KB K to an
enriched KB K with K = M(K).
M is an ordered list of atomic enrichment functions
m ∈ M
M =
φ if K = K ,
(m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise.
Research questions
1 How to create self-configuring atomic enrichment
functions m ∈ M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26
11.
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 8/26
12.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
13.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
14.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
15.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
16.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
17.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
18.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
19.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
20.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
21.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
22.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
23.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
24.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
25.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
26.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
27.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
28.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
29.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
30.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
31.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specified
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
32.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specified
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
33.
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specified
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
34.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
35.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
36.
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
37.
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Refinement Operator
ρ(M) =
∀m∈M
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
38.
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Refinement Operator
ρ(M) =
∀m∈M
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
39.
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Refinement Operator
ρ(M) =
∀m∈M
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
40.
Motivation Approach Evaluation Conclusion and Future Work
Positive Example
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 17/26
41.
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
42.
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
43.
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
44.
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
45.
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
46.
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
47.
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
48.
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
(m3, m2, m1)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
49.
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through the
whole refinement tree
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
50.
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through the
whole refinement tree
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
51.
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through the
whole refinement tree
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
52.
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 20/26
53.
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia
(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10
Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26
54.
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia
(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10
Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26
55.
Motivation Approach Evaluation Conclusion and Future Work
Configuration of the Search Strategy
Node fitness
f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to
breadth first search (ω > 0).
Result: ω = 0.75 leads to the
best results
ω P R F
0 1.0 0.99 0.99
0.25 1.0 0.99 0.99
0.50 1.0 0.99 0.99
0.75 1.0 1.0 1.0
1.0 1.0 0.99 0.99
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 22/26
57.
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 24/26
58.
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Presented self-configuring atomic enrichment functions
Presented an approach for learning enrichment pipelines
based on a refinement operator
Showed that our approach can easily reconstruct
manually created enrichment pipelines
Future Work
Parallelize the algorithm on several CPUs as well as load
balancing
Support directed acyclic graphs as enrichment
specifications by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26
59.
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Presented self-configuring atomic enrichment functions
Presented an approach for learning enrichment pipelines
based on a refinement operator
Showed that our approach can easily reconstruct
manually created enrichment pipelines
Future Work
Parallelize the algorithm on several CPUs as well as load
balancing
Support directed acyclic graphs as enrichment
specifications by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26
60.
Motivation Approach Evaluation Conclusion and Future Work
Thank You!
Questions?
Mohamed Sherif
Augustusplatz 10
D-04109 Leipzig
sherif@informatik.uni-leipzig.de
http://aksw.org/MohamedSherif
http://aksw.org/Projects/DEER
#akswgroup
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 26/26