DEER - RDF Data Extraction and Enrichment Framework

Mohamed Sherif
Mohamed SherifPhD Student at Leipzig University
Motivation Approach Evaluation Conclusion and Future Work
DEER
RDF Data Extraction and Enrichment Framework
Mohamed Ahmed Sherif
September 15, 2015
Mohamed Ahmed Sherif — DEER 1/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 2/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 3/31
Motivation Approach Evaluation Conclusion and Future Work
RDF Extraction & Enrichment
Need for enriched datasets
Tourism
Question Answering
Enhanced Reality
...
RDF extraction and enrichment
Triples to be added to the original
KB and/or
Triples to be deleted from the
original KB
Mohamed Ahmed Sherif — DEER 4/31
Motivation Approach Evaluation Conclusion and Future Work
Example
GeoSuPhat
mo:MusicArtist
... artist name of George (Pete) Peterson. Born and raised
near the shores of Lake Michigan and its cold windy winters
he was Influenced by bands like Pink Floyd, Kraftwerk, Todd
Rundgren, ELP, Hawkwind and his love of computers and
synthesis. After serving in the military and a few years
going to college in Colorado where he studied electronics
and computer programming. George moved to Europe and
settled down in Germany where he had met his wife
Christine. ...
jamendo:336993
foaf:name
a
mo:biography
Mohamed Ahmed Sherif — DEER 5/31
Motivation Approach Evaluation Conclusion and Future Work
DEER Enrichment Functions
Idea
Generate enrichment data based on existing data
Input/output are RDF datasets
Current Enrichment Functions
Dereferencing
Linking
NLP
Conformation
Filter
Mohamed Ahmed Sherif — DEER 6/31
Motivation Approach Evaluation Conclusion and Future Work
DEER Enrichment Operators
Idea
Create complex workflows by connecting modules
Input/output are RDF datasets
Current Enrichment Operators
Clone
Merge
Mohamed Ahmed Sherif — DEER 7/31
Motivation Approach Evaluation Conclusion and Future Work
DEER Specification Paradigm
d1
Dereferencing
d2
cloned3
NLP
d4
Filter
d5 d6Merge
d7
AuthorityConform
d8
Mohamed Ahmed Sherif — DEER 8/31
Motivation Approach Evaluation Conclusion and Future Work
Manual DEER Specification
1 @ p r e f i x : <http :// geoknow . org / s p e c s o n t o l o g y/> .
2 @ p r e f i x r d f s : <http ://www. w3 . org /2000/01/ rdf−schema#> .
3 @ p r e f i x geo : <http ://www. w3 . org /2003/01/ geo/ wgs84 pos#> .
4 : d1 a : Dataset ;
5 : hasUri <http :// dbpedia . org / r e s o u r c e / B e r l i n> ;
6 : fromEndPoint <http :// dbpedia . org / sparql> .
7 : d2 a : Dataset .
8 : d3 a : Dataset .
9 : d4 a : Dataset .
10 : d5 a : Dataset .
11 : d6 a : Dataset .
12 : d7 a : Dataset .
13 : d8 a : Dataset ;
14 : o u t p u t F i l e ” D e e r B e r l i n . t t l ” ;
15 : outputFormat ” T u r t l e ” .
16 : d e r e f a : Module , : DereferencingModule ;
17 r d f s : l a b e l ” D e r e f e r e n c i n g module” ;
18 : hasInput : d1 ;
19 : hasOutput : d2 ;
20 : hasParameter : derefParam1 .
21 : derefParam1 a : ModuleParameter , : DereferencingModuleParameter ;
22 : hasKey ” i n p u t P r o p e r t y 1 ” ;
23 : hasValue geo : l a t .
24 : c l o n e a : Operator , : CloneOperator ;
25 r d f s : l a b e l ” Clone o p e r a t o r ” ;
26 : hasInput : d2 ;
27 : hasOutput : d3 , : d4 .
28 : nlp a : Module , : NLPModule ;
29 r d f s : l a b e l ”NLP module” ;
30 : hasInput : d3 ;
31 : hasOutput : d5 ;
32 : hasParameter : nlpPram1 , : nlpPram2 .Mohamed Ahmed Sherif — DEER 9/31
Motivation Approach Evaluation Conclusion and Future Work
Manual KB Enrichment
Manual customized enrichment pipelines
⊕ Leads to the expected results
Time consuming
Cannot be ported easily to other datasets
Mohamed Ahmed Sherif — DEER 10/31
Motivation Approach Evaluation Conclusion and Future Work
Automatic KB Enrichment
Enrichment pipeline M : K → K that maps KB K to an
enriched KB K with K = M(K).
M is an ordered list of atomic enrichment functions
m ∈ M
M =
φ if K = K ,
(m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise.
Research questions
1 How to create self-configuring atomic enrichment
functions m ∈ M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif — DEER 11/31
Motivation Approach Evaluation Conclusion and Future Work
Automatic KB Enrichment
Enrichment pipeline M : K → K that maps KB K to an
enriched KB K with K = M(K).
M is an ordered list of atomic enrichment functions
m ∈ M
M =
φ if K = K ,
(m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise.
Research questions
1 How to create self-configuring atomic enrichment
functions m ∈ M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif — DEER 11/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 12/31
Motivation Approach Evaluation Conclusion and Future Work
Running Example
Dataset DrugBank
Goal Gather information about companies related to drugs for
a market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif — DEER 13/31
Motivation Approach Evaluation Conclusion and Future Work
Running Example
Dataset DrugBank
Goal Gather information about companies related to drugs for
a market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 13/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif — DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
Mohamed Ahmed Sherif — DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
rdfs:comment
Mohamed Ahmed Sherif — DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif — DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif — DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif — DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif — DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 18/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 18/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 18/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specified
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 19/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specified
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 19/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specified
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 19/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 20/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 20/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Configuration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 20/31
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Refinement Operator
ρ(M) =
∀m∈M
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif — DEER 21/31
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Refinement Operator
ρ(M) =
∀m∈M
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif — DEER 21/31
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Refinement Operator
ρ(M) =
∀m∈M
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif — DEER 21/31
Motivation Approach Evaluation Conclusion and Future Work
Positive Example
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen
Mohamed Ahmed Sherif — DEER 22/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥
2 Self-configure all mi ∈ M, add as child to ⊥
3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
(m3, m2, m1)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through the
whole refinement tree
Mohamed Ahmed Sherif — DEER 24/31
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through the
whole refinement tree
Mohamed Ahmed Sherif — DEER 24/31
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through the
whole refinement tree
Mohamed Ahmed Sherif — DEER 24/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 25/31
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia
(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10
Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif — DEER 26/31
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia
(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10
Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif — DEER 26/31
Motivation Approach Evaluation Conclusion and Future Work
Configuration of the Search Strategy
Node fitness
f (n) = F(n) − ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)
Search strategies closer to
breadth first search (ω > 0).
Result: ω = 0.75 leads to the
best results
ω P R F
0 1.0 0.99 0.99
0.25 1.0 0.99 0.99
0.50 1.0 0.99 0.99
0.75 1.0 1.0 1.0
1.0 1.0 0.99 0.99
Mohamed Ahmed Sherif — DEER 27/31
Motivation Approach Evaluation Conclusion and Future Work
Effect of Positive Examples
Manual Examples Size of Time Size of Time Learn Iterations
M count M M(KB) learned
M
M (KB) Time count F-score
M1
DBpedia
1 1 0.2 1 1.6 1.3 1 1.0
2 1 0.2 1 1.8 1.3 1 1.0
M2
DBpedia
1 2 23.3 1 0.1 0.2 1 0.99
2 2 15 2 17 0.3 9 0.99
M3
DBpedia
1 3 14.7 3 15.2 6.1 9 0.99
2 3 15 2 15.1 0.1 9 0.99
M4
DBpedia
1 4 0.4 2 0.1 0.7 2 0.99
2 4 0.6 2 0.3 0.9 2 0.99
M5
DBpedia
1 5 22 2 0.1 0.7 2 1.0
2 5 25.5 2 0.2 0.9 2 1.0
M1
DrugBank
1 2 3.5 1 4.1 0.1 10 0.99
2 2 3.6 1 3.4 0.1 10 0.99
M2
DrugBank
1 3 25.2 1 0.1 0.1 10 0.99
2 3 22.8 1 0.1 0.1 10 0.99
M1
Jamendo
1 1 10.9 2 10.6 0.1 2 0.99
2 1 10.4 2 10.4 0.1 1 0.99
Mohamed Ahmed Sherif — DEER 28/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 29/31
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Introduced DEER
Presented self-configuring atomic enrichment functions
Presented an approach for learning enrichment pipelines
based on a refinement operator
Future Work
Parallelize the algorithm on several CPUs as well as load
balancing
Support directed acyclic graphs as enrichment
specifications by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Implement more enrichment function and operators
Mohamed Ahmed Sherif — DEER 30/31
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Introduced DEER
Presented self-configuring atomic enrichment functions
Presented an approach for learning enrichment pipelines
based on a refinement operator
Future Work
Parallelize the algorithm on several CPUs as well as load
balancing
Support directed acyclic graphs as enrichment
specifications by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Implement more enrichment function and operators
Mohamed Ahmed Sherif — DEER 30/31
Motivation Approach Evaluation Conclusion and Future Work
Thank You! Questions?
Mohamed Sherif
Augustusplatz 10
D-04109 Leipzig
sherif@informatik.uni-leipzig.de
http://aksw.org/MohamedSherif
http://aksw.org/Projects/DEER
Automating RDF Dataset Transformation and Enrichment by Mohamed Ahmed Sherif, Axel-Cyrille Ngonga
Ngomo, and Jens Lehmann in 12th Extended Semantic Web Conference, Portoroz, Slovenia, 2015
Mohamed Ahmed Sherif — DEER 31/31
1 of 64

Recommended

Dependency Parsing-based QA System for RDF and SPARQL by
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLFariz Darari
1.5K views56 slides
Cells part 4 pro vs. euk by
Cells part 4 pro vs. eukCells part 4 pro vs. euk
Cells part 4 pro vs. eukecrooke
326 views3 slides
Stage marketing pour étudiants du SAWI by
Stage marketing pour étudiants du SAWI Stage marketing pour étudiants du SAWI
Stage marketing pour étudiants du SAWI Nicolas Nervi
221 views1 slide
Desimlocker iphone by
Desimlocker iphoneDesimlocker iphone
Desimlocker iphonebrokenobstructi48
53 views1 slide
Recursos didácticos 1... by
Recursos didácticos 1...Recursos didácticos 1...
Recursos didácticos 1...Martha Sanchez
168 views8 slides
Proyecto educativo 1 by
Proyecto educativo 1Proyecto educativo 1
Proyecto educativo 1manuela villalobos
305 views2 slides

More Related Content

Viewers also liked

Apuntes segundo parcial by
Apuntes segundo parcialApuntes segundo parcial
Apuntes segundo parcialluisin777
235 views5 slides
DEER - Automating RDF Dataset Transformation and Enrichment by
DEER - Automating RDF Dataset Transformation and EnrichmentDEER - Automating RDF Dataset Transformation and Enrichment
DEER - Automating RDF Dataset Transformation and EnrichmentMohamed Sherif
3.5K views60 slides
Gerencia de proyectos by
Gerencia de proyectosGerencia de proyectos
Gerencia de proyectosGONZALO MENDOZA
164 views4 slides
Cáncer de próstata by
Cáncer de próstataCáncer de próstata
Cáncer de próstatafont Fawn
411 views16 slides
Endogamia e heterose by
Endogamia e heteroseEndogamia e heterose
Endogamia e heteroseCatarina Maria Dall Agnol
7.4K views19 slides
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas by
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark ThomasFeeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark ThomasDAIReXNET
4.2K views61 slides

Viewers also liked(10)

Apuntes segundo parcial by luisin777
Apuntes segundo parcialApuntes segundo parcial
Apuntes segundo parcial
luisin777235 views
DEER - Automating RDF Dataset Transformation and Enrichment by Mohamed Sherif
DEER - Automating RDF Dataset Transformation and EnrichmentDEER - Automating RDF Dataset Transformation and Enrichment
DEER - Automating RDF Dataset Transformation and Enrichment
Mohamed Sherif3.5K views
Cáncer de próstata by font Fawn
Cáncer de próstataCáncer de próstata
Cáncer de próstata
font Fawn411 views
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas by DAIReXNET
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark ThomasFeeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas
DAIReXNET4.2K views
A guide to dairy calf feeding and management by vincentwambua
A guide to dairy calf feeding and managementA guide to dairy calf feeding and management
A guide to dairy calf feeding and management
vincentwambua196 views
Power point Presentation on SEVAI - COW PROJECT, . by sevaingo
Power point Presentation on SEVAI - COW PROJECT, .Power point Presentation on SEVAI - COW PROJECT, .
Power point Presentation on SEVAI - COW PROJECT, .
sevaingo7.4K views
Natural Parks in the U.S. by Mallorca Vegas
Natural Parks in the U.S.Natural Parks in the U.S.
Natural Parks in the U.S.
Mallorca Vegas1.3K views
Avoiding Disease in Dairy Calves by DAIReXNET
Avoiding Disease in Dairy CalvesAvoiding Disease in Dairy Calves
Avoiding Disease in Dairy Calves
DAIReXNET2.5K views

Similar to DEER - RDF Data Extraction and Enrichment Framework

Unit 5 research structure template by
Unit 5 research structure templateUnit 5 research structure template
Unit 5 research structure templateRob Fogerty
430 views32 slides
Unit 5 research structure template by
Unit 5 research structure templateUnit 5 research structure template
Unit 5 research structure templateRob Fogerty
1.9K views32 slides
Unit 5 research structure template referral 1/2 by
Unit 5 research structure template referral 1/2 Unit 5 research structure template referral 1/2
Unit 5 research structure template referral 1/2 Rob Fogerty
200 views32 slides
Unit 5 research structure template referral by
Unit 5 research structure template referralUnit 5 research structure template referral
Unit 5 research structure template referralRob Fogerty
374 views32 slides
The Search and Hyperlinking Task at MediaEval 2014 by
The Search and Hyperlinking Task at MediaEval 2014The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014multimediaeval
291 views39 slides
Efficient source selection for sparql endpoint federation by
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationMuhammad Saleem
642 views113 slides

Similar to DEER - RDF Data Extraction and Enrichment Framework(20)

Unit 5 research structure template by Rob Fogerty
Unit 5 research structure templateUnit 5 research structure template
Unit 5 research structure template
Rob Fogerty430 views
Unit 5 research structure template by Rob Fogerty
Unit 5 research structure templateUnit 5 research structure template
Unit 5 research structure template
Rob Fogerty1.9K views
Unit 5 research structure template referral 1/2 by Rob Fogerty
Unit 5 research structure template referral 1/2 Unit 5 research structure template referral 1/2
Unit 5 research structure template referral 1/2
Rob Fogerty200 views
Unit 5 research structure template referral by Rob Fogerty
Unit 5 research structure template referralUnit 5 research structure template referral
Unit 5 research structure template referral
Rob Fogerty374 views
The Search and Hyperlinking Task at MediaEval 2014 by multimediaeval
The Search and Hyperlinking Task at MediaEval 2014The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014
multimediaeval291 views
Efficient source selection for sparql endpoint federation by Muhammad Saleem
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
Muhammad Saleem642 views
Federated Query Formulation and Processing Through BioFed by Muhammad Saleem
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
Muhammad Saleem299 views
Alex Tellez, Deep Learning Applications by Sri Ambati
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning Applications
Sri Ambati4.2K views
Search and Hyperlinking Overview @MediaEval2014 by Maria Eskevich
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
Maria Eskevich395 views
Metadata Provenance by Kai Eckert
Metadata ProvenanceMetadata Provenance
Metadata Provenance
Kai Eckert1.7K views
Entity Summarization with User Feedback (ESWC 2020) by Qingxia Liu
Entity Summarization with User Feedback (ESWC 2020)Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)
Qingxia Liu71 views
Entities, Graphs, and Crowdsourcing for better Web Search by eXascale Infolab
Entities, Graphs, and Crowdsourcing for better Web SearchEntities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web Search
eXascale Infolab790 views
Market Research Meets Big Data Analytics for Business Transformation by Sally Sadosky
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
Sally Sadosky2.6K views
Federated SPARQL Query Processing ISWC2015 Tutorial by Muhammad Saleem
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
Muhammad Saleem1.1K views
Extending faceted search to the general web by Wei-Yuan Chang
Extending faceted search to the general webExtending faceted search to the general web
Extending faceted search to the general web
Wei-Yuan Chang790 views
Me4DCAP V0.1: a Method for the Development of Dublin Core Application Profiles by Mariana Curado Malta
Me4DCAP V0.1: a Method for the Development of Dublin Core Application ProfilesMe4DCAP V0.1: a Method for the Development of Dublin Core Application Profiles
Me4DCAP V0.1: a Method for the Development of Dublin Core Application Profiles
Data council sf amundsen presentation by Tao Feng
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng2.8K views

Recently uploaded

SUMIT SQL PROJECT SUPERSTORE 1.pptx by
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptxSumit Jadhav
18 views26 slides
fakenews_DBDA_Mar23.pptx by
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptxdeepmitra8
16 views34 slides
Design_Discover_Develop_Campaign.pptx by
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptxShivanshSeth6
37 views20 slides
MK__Cert.pdf by
MK__Cert.pdfMK__Cert.pdf
MK__Cert.pdfHassan Khan
15 views1 slide
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf by
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfAlhamduKure
6 views11 slides
sam_software_eng_cv.pdf by
sam_software_eng_cv.pdfsam_software_eng_cv.pdf
sam_software_eng_cv.pdfsammyigbinovia
8 views5 slides

Recently uploaded(20)

SUMIT SQL PROJECT SUPERSTORE 1.pptx by Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 18 views
fakenews_DBDA_Mar23.pptx by deepmitra8
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptx
deepmitra816 views
Design_Discover_Develop_Campaign.pptx by ShivanshSeth6
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptx
ShivanshSeth637 views
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf by AlhamduKure
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
AlhamduKure6 views
GDSC Mikroskil Members Onboarding 2023.pdf by gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil58 views
_MAKRIADI-FOTEINI_diploma thesis.pptx by fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi8 views
Ansari: Practical experiences with an LLM-based Islamic Assistant by M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant
M Waleed Kadous5 views
Searching in Data Structure by raghavbirla63
Searching in Data StructureSearching in Data Structure
Searching in Data Structure
raghavbirla6314 views
MongoDB.pdf by ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR345 views
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang78109 views
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... by csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn5 views
Design of machine elements-UNIT 3.pptx by gopinathcreddy
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptx
gopinathcreddy33 views
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)

DEER - RDF Data Extraction and Enrichment Framework

  • 1. Motivation Approach Evaluation Conclusion and Future Work DEER RDF Data Extraction and Enrichment Framework Mohamed Ahmed Sherif September 15, 2015 Mohamed Ahmed Sherif — DEER 1/31
  • 2. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif — DEER 2/31
  • 3. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif — DEER 3/31
  • 4. Motivation Approach Evaluation Conclusion and Future Work RDF Extraction & Enrichment Need for enriched datasets Tourism Question Answering Enhanced Reality ... RDF extraction and enrichment Triples to be added to the original KB and/or Triples to be deleted from the original KB Mohamed Ahmed Sherif — DEER 4/31
  • 5. Motivation Approach Evaluation Conclusion and Future Work Example GeoSuPhat mo:MusicArtist ... artist name of George (Pete) Peterson. Born and raised near the shores of Lake Michigan and its cold windy winters he was Influenced by bands like Pink Floyd, Kraftwerk, Todd Rundgren, ELP, Hawkwind and his love of computers and synthesis. After serving in the military and a few years going to college in Colorado where he studied electronics and computer programming. George moved to Europe and settled down in Germany where he had met his wife Christine. ... jamendo:336993 foaf:name a mo:biography Mohamed Ahmed Sherif — DEER 5/31
  • 6. Motivation Approach Evaluation Conclusion and Future Work DEER Enrichment Functions Idea Generate enrichment data based on existing data Input/output are RDF datasets Current Enrichment Functions Dereferencing Linking NLP Conformation Filter Mohamed Ahmed Sherif — DEER 6/31
  • 7. Motivation Approach Evaluation Conclusion and Future Work DEER Enrichment Operators Idea Create complex workflows by connecting modules Input/output are RDF datasets Current Enrichment Operators Clone Merge Mohamed Ahmed Sherif — DEER 7/31
  • 8. Motivation Approach Evaluation Conclusion and Future Work DEER Specification Paradigm d1 Dereferencing d2 cloned3 NLP d4 Filter d5 d6Merge d7 AuthorityConform d8 Mohamed Ahmed Sherif — DEER 8/31
  • 9. Motivation Approach Evaluation Conclusion and Future Work Manual DEER Specification 1 @ p r e f i x : <http :// geoknow . org / s p e c s o n t o l o g y/> . 2 @ p r e f i x r d f s : <http ://www. w3 . org /2000/01/ rdf−schema#> . 3 @ p r e f i x geo : <http ://www. w3 . org /2003/01/ geo/ wgs84 pos#> . 4 : d1 a : Dataset ; 5 : hasUri <http :// dbpedia . org / r e s o u r c e / B e r l i n> ; 6 : fromEndPoint <http :// dbpedia . org / sparql> . 7 : d2 a : Dataset . 8 : d3 a : Dataset . 9 : d4 a : Dataset . 10 : d5 a : Dataset . 11 : d6 a : Dataset . 12 : d7 a : Dataset . 13 : d8 a : Dataset ; 14 : o u t p u t F i l e ” D e e r B e r l i n . t t l ” ; 15 : outputFormat ” T u r t l e ” . 16 : d e r e f a : Module , : DereferencingModule ; 17 r d f s : l a b e l ” D e r e f e r e n c i n g module” ; 18 : hasInput : d1 ; 19 : hasOutput : d2 ; 20 : hasParameter : derefParam1 . 21 : derefParam1 a : ModuleParameter , : DereferencingModuleParameter ; 22 : hasKey ” i n p u t P r o p e r t y 1 ” ; 23 : hasValue geo : l a t . 24 : c l o n e a : Operator , : CloneOperator ; 25 r d f s : l a b e l ” Clone o p e r a t o r ” ; 26 : hasInput : d2 ; 27 : hasOutput : d3 , : d4 . 28 : nlp a : Module , : NLPModule ; 29 r d f s : l a b e l ”NLP module” ; 30 : hasInput : d3 ; 31 : hasOutput : d5 ; 32 : hasParameter : nlpPram1 , : nlpPram2 .Mohamed Ahmed Sherif — DEER 9/31
  • 10. Motivation Approach Evaluation Conclusion and Future Work Manual KB Enrichment Manual customized enrichment pipelines ⊕ Leads to the expected results Time consuming Cannot be ported easily to other datasets Mohamed Ahmed Sherif — DEER 10/31
  • 11. Motivation Approach Evaluation Conclusion and Future Work Automatic KB Enrichment Enrichment pipeline M : K → K that maps KB K to an enriched KB K with K = M(K). M is an ordered list of atomic enrichment functions m ∈ M M = φ if K = K , (m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise. Research questions 1 How to create self-configuring atomic enrichment functions m ∈ M? 2 How to automatically generate an enrichment pipeline M? Mohamed Ahmed Sherif — DEER 11/31
  • 12. Motivation Approach Evaluation Conclusion and Future Work Automatic KB Enrichment Enrichment pipeline M : K → K that maps KB K to an enriched KB K with K = M(K). M is an ordered list of atomic enrichment functions m ∈ M M = φ if K = K , (m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise. Research questions 1 How to create self-configuring atomic enrichment functions m ∈ M? 2 How to automatically generate an enrichment pipeline M? Mohamed Ahmed Sherif — DEER 11/31
  • 13. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif — DEER 12/31
  • 14. Motivation Approach Evaluation Conclusion and Future Work Running Example Dataset DrugBank Goal Gather information about companies related to drugs for a market study :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif — DEER 13/31
  • 15. Motivation Approach Evaluation Conclusion and Future Work Running Example Dataset DrugBank Goal Gather information about companies related to drugs for a market study :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 13/31
  • 16. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif — DEER 14/31
  • 17. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 14/31
  • 18. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:commentrdfs:comment Mohamed Ahmed Sherif — DEER 14/31
  • 19. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:commentrdfs:comment rdfs:comment Mohamed Ahmed Sherif — DEER 14/31
  • 20. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif — DEER 15/31
  • 21. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif — DEER 15/31
  • 22. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif — DEER 15/31
  • 23. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif — DEER 15/31
  • 24. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 16/31
  • 25. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 16/31
  • 26. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 16/31
  • 27. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 16/31
  • 28. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 17/31
  • 29. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 17/31
  • 30. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 17/31
  • 31. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 17/31
  • 32. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 18/31
  • 33. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 18/31
  • 34. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 18/31
  • 35. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-specified predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 19/31
  • 36. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-specified predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 19/31
  • 37. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-specified predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo:relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 19/31
  • 38. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 20/31
  • 39. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 20/31
  • 40. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo:relatedCompany owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif — DEER 20/31
  • 41. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Refinement Operator Input Set of atomic enrichment functions M Set of positive examples E Refinement Operator ρ(M) = ∀m∈M M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif — DEER 21/31
  • 42. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Refinement Operator Input Set of atomic enrichment functions M Set of positive examples E Refinement Operator ρ(M) = ∀m∈M M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif — DEER 21/31
  • 43. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Refinement Operator Input Set of atomic enrichment functions M Set of positive examples E Refinement Operator ρ(M) = ∀m∈M M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif — DEER 21/31
  • 44. Motivation Approach Evaluation Conclusion and Future Work Positive Example :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Enriched CBD of Ibuprofen Mohamed Ahmed Sherif — DEER 22/31
  • 45. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ Mohamed Ahmed Sherif — DEER 23/31
  • 46. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) Mohamed Ahmed Sherif — DEER 23/31
  • 47. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) Mohamed Ahmed Sherif — DEER 23/31
  • 48. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) Mohamed Ahmed Sherif — DEER 23/31
  • 49. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) Mohamed Ahmed Sherif — DEER 23/31
  • 50. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) Mohamed Ahmed Sherif — DEER 23/31
  • 51. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) Mohamed Ahmed Sherif — DEER 23/31
  • 52. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) (m3, m2, m1) Mohamed Ahmed Sherif — DEER 23/31
  • 53. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the node’s children count and level Node fitness f (n) Difference between node’s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth-first search (ω > 0). Most promising node The leaf node with the maximum fitness through the whole refinement tree Mohamed Ahmed Sherif — DEER 24/31
  • 54. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the node’s children count and level Node fitness f (n) Difference between node’s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth-first search (ω > 0). Most promising node The leaf node with the maximum fitness through the whole refinement tree Mohamed Ahmed Sherif — DEER 24/31
  • 55. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the node’s children count and level Node fitness f (n) Difference between node’s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth-first search (ω > 0). Most promising node The leaf node with the maximum fitness through the whole refinement tree Mohamed Ahmed Sherif — DEER 24/31
  • 56. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif — DEER 25/31
  • 57. Motivation Approach Evaluation Conclusion and Future Work Experimental Setup Datasets 1 manual experimental enrichment pipelines for Jamendo 2 manual experimental enrichment pipelines for DrugBank 5 manual experimental enrichment pipelines for DBpedia (AdministrativeRegion) Learning Algorithm 6 atomic enrichment functions Termination criterion: Maximum number of iterations of 10 Optimal enrichment pipeline found (F-score = 1) Mohamed Ahmed Sherif — DEER 26/31
  • 58. Motivation Approach Evaluation Conclusion and Future Work Experimental Setup Datasets 1 manual experimental enrichment pipelines for Jamendo 2 manual experimental enrichment pipelines for DrugBank 5 manual experimental enrichment pipelines for DBpedia (AdministrativeRegion) Learning Algorithm 6 atomic enrichment functions Termination criterion: Maximum number of iterations of 10 Optimal enrichment pipeline found (F-score = 1) Mohamed Ahmed Sherif — DEER 26/31
  • 59. Motivation Approach Evaluation Conclusion and Future Work Configuration of the Search Strategy Node fitness f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth first search (ω > 0). Result: ω = 0.75 leads to the best results ω P R F 0 1.0 0.99 0.99 0.25 1.0 0.99 0.99 0.50 1.0 0.99 0.99 0.75 1.0 1.0 1.0 1.0 1.0 0.99 0.99 Mohamed Ahmed Sherif — DEER 27/31
  • 60. Motivation Approach Evaluation Conclusion and Future Work Effect of Positive Examples Manual Examples Size of Time Size of Time Learn Iterations M count M M(KB) learned M M (KB) Time count F-score M1 DBpedia 1 1 0.2 1 1.6 1.3 1 1.0 2 1 0.2 1 1.8 1.3 1 1.0 M2 DBpedia 1 2 23.3 1 0.1 0.2 1 0.99 2 2 15 2 17 0.3 9 0.99 M3 DBpedia 1 3 14.7 3 15.2 6.1 9 0.99 2 3 15 2 15.1 0.1 9 0.99 M4 DBpedia 1 4 0.4 2 0.1 0.7 2 0.99 2 4 0.6 2 0.3 0.9 2 0.99 M5 DBpedia 1 5 22 2 0.1 0.7 2 1.0 2 5 25.5 2 0.2 0.9 2 1.0 M1 DrugBank 1 2 3.5 1 4.1 0.1 10 0.99 2 2 3.6 1 3.4 0.1 10 0.99 M2 DrugBank 1 3 25.2 1 0.1 0.1 10 0.99 2 3 22.8 1 0.1 0.1 10 0.99 M1 Jamendo 1 1 10.9 2 10.6 0.1 2 0.99 2 1 10.4 2 10.4 0.1 1 0.99 Mohamed Ahmed Sherif — DEER 28/31
  • 61. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif — DEER 29/31
  • 62. Motivation Approach Evaluation Conclusion and Future Work Conclusion and Future Work Conclusion Introduced DEER Presented self-configuring atomic enrichment functions Presented an approach for learning enrichment pipelines based on a refinement operator Future Work Parallelize the algorithm on several CPUs as well as load balancing Support directed acyclic graphs as enrichment specifications by allowing to split and merge datasets Pro-active enrichment strategies and active learning Implement more enrichment function and operators Mohamed Ahmed Sherif — DEER 30/31
  • 63. Motivation Approach Evaluation Conclusion and Future Work Conclusion and Future Work Conclusion Introduced DEER Presented self-configuring atomic enrichment functions Presented an approach for learning enrichment pipelines based on a refinement operator Future Work Parallelize the algorithm on several CPUs as well as load balancing Support directed acyclic graphs as enrichment specifications by allowing to split and merge datasets Pro-active enrichment strategies and active learning Implement more enrichment function and operators Mohamed Ahmed Sherif — DEER 30/31
  • 64. Motivation Approach Evaluation Conclusion and Future Work Thank You! Questions? Mohamed Sherif Augustusplatz 10 D-04109 Leipzig sherif@informatik.uni-leipzig.de http://aksw.org/MohamedSherif http://aksw.org/Projects/DEER Automating RDF Dataset Transformation and Enrichment by Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann in 12th Extended Semantic Web Conference, Portoroz, Slovenia, 2015 Mohamed Ahmed Sherif — DEER 31/31