SlideShare a Scribd company logo
Motivation Approach Evaluation Conclusion and Future Work
DEER
Automating RDF Dataset Transformation and Enrichment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and
Jens Lehmann
June 3, 2015
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 1/26
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 2/26
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 3/26
Motivation Approach Evaluation Conclusion and Future Work
Why RDF Transformation & Enrichment?
Dataset DrugBank
Goal Gather information about companies related to drugs for
a market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26
Motivation Approach Evaluation Conclusion and Future Work
Why RDF Transformation & Enrichment?
Dataset DrugBank
Goal Gather information about companies related to drugs for
a market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26
Motivation Approach Evaluation Conclusion and Future Work
RDF Transformation & Enrichment
Need for enriched datasets
Tourism
Question Answering
Enhanced Reality
...
RDF transformation and enrichment
Triples to be added to the original
KB and/or
Triples to be deleted from the
original KB
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 5/26
Motivation Approach Evaluation Conclusion and Future Work
Manual Knowledge Base Enrichment
Demands for the specification of data enrichment
pipelines
Describe how data is to be integrated (usually manually)
Manual customized enrichment pipelines
⊕ Leads to the expected results
Time consuming
Cannot be ported easily to other datasets
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26
Motivation Approach Evaluation Conclusion and Future Work
Manual Knowledge Base Enrichment
Demands for the specification of data enrichment
pipelines
Describe how data is to be integrated (usually manually)
Manual customized enrichment pipelines
⊕ Leads to the expected results
Time consuming
Cannot be ported easily to other datasets
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26
Motivation Approach Evaluation Conclusion and Future Work
Automatic Knowledge Base Enrichment
Enrichment pipeline M : K → K that maps KB K to an
enriched KB K with K = M(K).
M is an ordered list of atomic enrichment functions
m ∈ M
M =
φ if K = K ,
(m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise.
Research questions
1 How to create self-configuring atomic enrichment
functions m ∈ M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26
Motivation Approach Evaluation Conclusion and Future Work
Automatic Knowledge Base Enrichment
Enrichment pipeline M : K → K that maps KB K to an
enriched KB K with K = M(K).
M is an ordered list of atomic enrichment functions
m ∈ M
M =
φ if K = K ,
(m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise.
Research questions
1 How to create self-configuring atomic enrichment
functions m ∈ M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 8/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specified
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specified
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specified
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Refinement Operator
ρ(M) =
∀m∈M
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Refinement Operator
ρ(M) =
∀m∈M
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Refinement Operator
ρ(M) =
∀m∈M
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
Motivation Approach Evaluation Conclusion and Future Work
Positive Example
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 17/26
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
(m3, m2, m1)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through the
whole refinement tree
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through the
whole refinement tree
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through the
whole refinement tree
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 20/26
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia
(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10
Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia
(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10
Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26
Motivation Approach Evaluation Conclusion and Future Work
Configuration of the Search Strategy
Node fitness
f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to
breadth first search (ω > 0).
Result: ω = 0.75 leads to the
best results
ω P R F
0 1.0 0.99 0.99
0.25 1.0 0.99 0.99
0.50 1.0 0.99 0.99
0.75 1.0 1.0 1.0
1.0 1.0 0.99 0.99
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 22/26
Motivation Approach Evaluation Conclusion and Future Work
Effect of Positive Examples
Manual Examples Size of Time Size of Time Learn Iterations
M count M M(KB) learned
M
M (KB) Time count F-score
M1
DBpedia
1 1 0.2 1 1.6 1.3 1 1.0
2 1 0.2 1 1.8 1.3 1 1.0
M2
DBpedia
1 2 23.3 1 0.1 0.2 1 0.99
2 2 15 2 17 0.3 9 0.99
M3
DBpedia
1 3 14.7 3 15.2 6.1 9 0.99
2 3 15 2 15.1 0.1 9 0.99
M4
DBpedia
1 4 0.4 2 0.1 0.7 2 0.99
2 4 0.6 2 0.3 0.9 2 0.99
M5
DBpedia
1 5 22 2 0.1 0.7 2 1.0
2 5 25.5 2 0.2 0.9 2 1.0
M1
DrugBank
1 2 3.5 1 4.1 0.1 10 0.99
2 2 3.6 1 3.4 0.1 10 0.99
M2
DrugBank
1 3 25.2 1 0.1 0.1 10 0.99
2 3 22.8 1 0.1 0.1 10 0.99
M1
Jamendo
1 1 10.9 2 10.6 0.1 2 0.99
2 1 10.4 2 10.4 0.1 1 0.99
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 23/26
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 24/26
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Presented self-configuring atomic enrichment functions
Presented an approach for learning enrichment pipelines
based on a refinement operator
Showed that our approach can easily reconstruct
manually created enrichment pipelines
Future Work
Parallelize the algorithm on several CPUs as well as load
balancing
Support directed acyclic graphs as enrichment
specifications by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Presented self-configuring atomic enrichment functions
Presented an approach for learning enrichment pipelines
based on a refinement operator
Showed that our approach can easily reconstruct
manually created enrichment pipelines
Future Work
Parallelize the algorithm on several CPUs as well as load
balancing
Support directed acyclic graphs as enrichment
specifications by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26
Motivation Approach Evaluation Conclusion and Future Work
Thank You!
Questions?
Mohamed Sherif
Augustusplatz 10
D-04109 Leipzig
sherif@informatik.uni-leipzig.de
http://aksw.org/MohamedSherif
http://aksw.org/Projects/DEER
#akswgroup
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 26/26

More Related Content

Similar to DEER - Automating RDF Dataset Transformation and Enrichment

Protecting Artificial Intelligence/Machine Learning Inventions in the United ...
Protecting Artificial Intelligence/Machine Learning Inventions in the United ...Protecting Artificial Intelligence/Machine Learning Inventions in the United ...
Protecting Artificial Intelligence/Machine Learning Inventions in the United ...
Knobbe Martens - Intellectual Property Law
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap Dr. Mohan K. Bavirisetty
 
Estimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens lawEstimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens law
International Journal of Advance Research and Innovative Ideas in Education
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning Applications
Sri Ambati
 
Advanced Spreadsheet Skill By: Darwin B. Lope
Advanced Spreadsheet Skill By: Darwin B. LopeAdvanced Spreadsheet Skill By: Darwin B. Lope
Advanced Spreadsheet Skill By: Darwin B. Lope
JohnjosfirRoca
 
Lessons Learned: Guidance based on Early Experiences of Implementing ISO 5000...
Lessons Learned: Guidance based on Early Experiences of Implementing ISO 5000...Lessons Learned: Guidance based on Early Experiences of Implementing ISO 5000...
Lessons Learned: Guidance based on Early Experiences of Implementing ISO 5000...
Arantico Ltd
 
Games to Improve Clinical Practice and Healthcare Administration
Games to Improve Clinical Practice and Healthcare AdministrationGames to Improve Clinical Practice and Healthcare Administration
Games to Improve Clinical Practice and Healthcare Administration
SeriousGamesAssoc
 
Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020
Daniel Kershaw
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation System
Seval Çapraz
 

Similar to DEER - Automating RDF Dataset Transformation and Enrichment (9)

Protecting Artificial Intelligence/Machine Learning Inventions in the United ...
Protecting Artificial Intelligence/Machine Learning Inventions in the United ...Protecting Artificial Intelligence/Machine Learning Inventions in the United ...
Protecting Artificial Intelligence/Machine Learning Inventions in the United ...
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap
 
Estimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens lawEstimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens law
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning Applications
 
Advanced Spreadsheet Skill By: Darwin B. Lope
Advanced Spreadsheet Skill By: Darwin B. LopeAdvanced Spreadsheet Skill By: Darwin B. Lope
Advanced Spreadsheet Skill By: Darwin B. Lope
 
Lessons Learned: Guidance based on Early Experiences of Implementing ISO 5000...
Lessons Learned: Guidance based on Early Experiences of Implementing ISO 5000...Lessons Learned: Guidance based on Early Experiences of Implementing ISO 5000...
Lessons Learned: Guidance based on Early Experiences of Implementing ISO 5000...
 
Games to Improve Clinical Practice and Healthcare Administration
Games to Improve Clinical Practice and Healthcare AdministrationGames to Improve Clinical Practice and Healthcare Administration
Games to Improve Clinical Practice and Healthcare Administration
 
Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation System
 

Recently uploaded

Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
An Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering TechniquesAn Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering Techniques
ambekarshweta25
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
itech2017
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 

Recently uploaded (20)

Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
An Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering TechniquesAn Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering Techniques
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 

DEER - Automating RDF Dataset Transformation and Enrichment

  • 1. Motivation Approach Evaluation Conclusion and Future Work DEER Automating RDF Dataset Transformation and Enrichment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann June 3, 2015 Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 1/26
  • 2. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 2/26
  • 3. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 3/26
  • 4. Motivation Approach Evaluation Conclusion and Future Work Why RDF Transformation & Enrichment? Dataset DrugBank Goal Gather information about companies related to drugs for a market study :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26
  • 5. Motivation Approach Evaluation Conclusion and Future Work Why RDF Transformation & Enrichment? Dataset DrugBank Goal Gather information about companies related to drugs for a market study :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26
  • 6. Motivation Approach Evaluation Conclusion and Future Work RDF Transformation & Enrichment Need for enriched datasets Tourism Question Answering Enhanced Reality ... RDF transformation and enrichment Triples to be added to the original KB and/or Triples to be deleted from the original KB :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 5/26
  • 7. Motivation Approach Evaluation Conclusion and Future Work Manual Knowledge Base Enrichment Demands for the specification of data enrichment pipelines Describe how data is to be integrated (usually manually) Manual customized enrichment pipelines ⊕ Leads to the expected results Time consuming Cannot be ported easily to other datasets Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26
  • 8. Motivation Approach Evaluation Conclusion and Future Work Manual Knowledge Base Enrichment Demands for the specification of data enrichment pipelines Describe how data is to be integrated (usually manually) Manual customized enrichment pipelines ⊕ Leads to the expected results Time consuming Cannot be ported easily to other datasets Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26
  • 9. Motivation Approach Evaluation Conclusion and Future Work Automatic Knowledge Base Enrichment Enrichment pipeline M : K → K that maps KB K to an enriched KB K with K = M(K). M is an ordered list of atomic enrichment functions m ∈ M M = φ if K = K , (m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise. Research questions 1 How to create self-configuring atomic enrichment functions m ∈ M? 2 How to automatically generate an enrichment pipeline M? Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26
  • 10. Motivation Approach Evaluation Conclusion and Future Work Automatic Knowledge Base Enrichment Enrichment pipeline M : K → K that maps KB K to an enriched KB K with K = M(K). M is an ordered list of atomic enrichment functions m ∈ M M = φ if K = K , (m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise. Research questions 1 How to create self-configuring atomic enrichment functions m ∈ M? 2 How to automatically generate an enrichment pipeline M? Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26
  • 11. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 8/26
  • 12. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
  • 13. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
  • 14. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:commentrdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
  • 15. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:commentrdfs:comment rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
  • 16. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
  • 17. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
  • 18. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
  • 19. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
  • 20. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
  • 21. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
  • 22. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
  • 23. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
  • 24. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
  • 25. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
  • 26. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
  • 27. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
  • 28. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
  • 29. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
  • 30. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
  • 31. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-specified predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
  • 32. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-specified predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
  • 33. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-specified predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo:relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
  • 34. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
  • 35. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
  • 36. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo:relatedCompany owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
  • 37. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Refinement Operator Input Set of atomic enrichment functions M Set of positive examples E Refinement Operator ρ(M) = ∀m∈M M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
  • 38. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Refinement Operator Input Set of atomic enrichment functions M Set of positive examples E Refinement Operator ρ(M) = ∀m∈M M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
  • 39. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Refinement Operator Input Set of atomic enrichment functions M Set of positive examples E Refinement Operator ρ(M) = ∀m∈M M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
  • 40. Motivation Approach Evaluation Conclusion and Future Work Positive Example :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Enriched CBD of Ibuprofen Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 17/26
  • 41. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  • 42. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  • 43. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  • 44. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  • 45. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  • 46. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  • 47. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  • 48. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) (m3, m2, m1) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  • 49. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the node’s children count and level Node fitness f (n) Difference between node’s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth-first search (ω > 0). Most promising node The leaf node with the maximum fitness through the whole refinement tree Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
  • 50. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the node’s children count and level Node fitness f (n) Difference between node’s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth-first search (ω > 0). Most promising node The leaf node with the maximum fitness through the whole refinement tree Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
  • 51. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the node’s children count and level Node fitness f (n) Difference between node’s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth-first search (ω > 0). Most promising node The leaf node with the maximum fitness through the whole refinement tree Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
  • 52. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 20/26
  • 53. Motivation Approach Evaluation Conclusion and Future Work Experimental Setup Datasets 1 manual experimental enrichment pipelines for Jamendo 2 manual experimental enrichment pipelines for DrugBank 5 manual experimental enrichment pipelines for DBpedia (AdministrativeRegion) Learning Algorithm 6 atomic enrichment functions Termination criterion: Maximum number of iterations of 10 Optimal enrichment pipeline found (F-score = 1) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26
  • 54. Motivation Approach Evaluation Conclusion and Future Work Experimental Setup Datasets 1 manual experimental enrichment pipelines for Jamendo 2 manual experimental enrichment pipelines for DrugBank 5 manual experimental enrichment pipelines for DBpedia (AdministrativeRegion) Learning Algorithm 6 atomic enrichment functions Termination criterion: Maximum number of iterations of 10 Optimal enrichment pipeline found (F-score = 1) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26
  • 55. Motivation Approach Evaluation Conclusion and Future Work Configuration of the Search Strategy Node fitness f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth first search (ω > 0). Result: ω = 0.75 leads to the best results ω P R F 0 1.0 0.99 0.99 0.25 1.0 0.99 0.99 0.50 1.0 0.99 0.99 0.75 1.0 1.0 1.0 1.0 1.0 0.99 0.99 Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 22/26
  • 56. Motivation Approach Evaluation Conclusion and Future Work Effect of Positive Examples Manual Examples Size of Time Size of Time Learn Iterations M count M M(KB) learned M M (KB) Time count F-score M1 DBpedia 1 1 0.2 1 1.6 1.3 1 1.0 2 1 0.2 1 1.8 1.3 1 1.0 M2 DBpedia 1 2 23.3 1 0.1 0.2 1 0.99 2 2 15 2 17 0.3 9 0.99 M3 DBpedia 1 3 14.7 3 15.2 6.1 9 0.99 2 3 15 2 15.1 0.1 9 0.99 M4 DBpedia 1 4 0.4 2 0.1 0.7 2 0.99 2 4 0.6 2 0.3 0.9 2 0.99 M5 DBpedia 1 5 22 2 0.1 0.7 2 1.0 2 5 25.5 2 0.2 0.9 2 1.0 M1 DrugBank 1 2 3.5 1 4.1 0.1 10 0.99 2 2 3.6 1 3.4 0.1 10 0.99 M2 DrugBank 1 3 25.2 1 0.1 0.1 10 0.99 2 3 22.8 1 0.1 0.1 10 0.99 M1 Jamendo 1 1 10.9 2 10.6 0.1 2 0.99 2 1 10.4 2 10.4 0.1 1 0.99 Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 23/26
  • 57. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 24/26
  • 58. Motivation Approach Evaluation Conclusion and Future Work Conclusion and Future Work Conclusion Presented self-configuring atomic enrichment functions Presented an approach for learning enrichment pipelines based on a refinement operator Showed that our approach can easily reconstruct manually created enrichment pipelines Future Work Parallelize the algorithm on several CPUs as well as load balancing Support directed acyclic graphs as enrichment specifications by allowing to split and merge datasets Pro-active enrichment strategies and active learning Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26
  • 59. Motivation Approach Evaluation Conclusion and Future Work Conclusion and Future Work Conclusion Presented self-configuring atomic enrichment functions Presented an approach for learning enrichment pipelines based on a refinement operator Showed that our approach can easily reconstruct manually created enrichment pipelines Future Work Parallelize the algorithm on several CPUs as well as load balancing Support directed acyclic graphs as enrichment specifications by allowing to split and merge datasets Pro-active enrichment strategies and active learning Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26
  • 60. Motivation Approach Evaluation Conclusion and Future Work Thank You! Questions? Mohamed Sherif Augustusplatz 10 D-04109 Leipzig sherif@informatik.uni-leipzig.de http://aksw.org/MohamedSherif http://aksw.org/Projects/DEER #akswgroup Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 26/26