Successfully reported this slideshow.

DEER - Automating RDF Dataset Transformation and Enrichment

1

Share

Loading in …3
×
1 of 60
1 of 60

DEER - Automating RDF Dataset Transformation and Enrichment

1

Share

Download to read offline

With the adoption of RDF across several domains come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means to enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against 8 manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.

With the adoption of RDF across several domains come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means to enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against 8 manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

DEER - Automating RDF Dataset Transformation and Enrichment

  1. 1. Motivation Approach Evaluation Conclusion and Future Work DEER Automating RDF Dataset Transformation and Enrichment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann June 3, 2015 Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 1/26
  2. 2. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 2/26
  3. 3. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 3/26
  4. 4. Motivation Approach Evaluation Conclusion and Future Work Why RDF Transformation & Enrichment? Dataset DrugBank Goal Gather information about companies related to drugs for a market study :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26
  5. 5. Motivation Approach Evaluation Conclusion and Future Work Why RDF Transformation & Enrichment? Dataset DrugBank Goal Gather information about companies related to drugs for a market study :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26
  6. 6. Motivation Approach Evaluation Conclusion and Future Work RDF Transformation & Enrichment Need for enriched datasets Tourism Question Answering Enhanced Reality ... RDF transformation and enrichment Triples to be added to the original KB and/or Triples to be deleted from the original KB :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 5/26
  7. 7. Motivation Approach Evaluation Conclusion and Future Work Manual Knowledge Base Enrichment Demands for the specification of data enrichment pipelines Describe how data is to be integrated (usually manually) Manual customized enrichment pipelines ⊕ Leads to the expected results Time consuming Cannot be ported easily to other datasets Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26
  8. 8. Motivation Approach Evaluation Conclusion and Future Work Manual Knowledge Base Enrichment Demands for the specification of data enrichment pipelines Describe how data is to be integrated (usually manually) Manual customized enrichment pipelines ⊕ Leads to the expected results Time consuming Cannot be ported easily to other datasets Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26
  9. 9. Motivation Approach Evaluation Conclusion and Future Work Automatic Knowledge Base Enrichment Enrichment pipeline M : K → K that maps KB K to an enriched KB K with K = M(K). M is an ordered list of atomic enrichment functions m ∈ M M = φ if K = K , (m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise. Research questions 1 How to create self-configuring atomic enrichment functions m ∈ M? 2 How to automatically generate an enrichment pipeline M? Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26
  10. 10. Motivation Approach Evaluation Conclusion and Future Work Automatic Knowledge Base Enrichment Enrichment pipeline M : K → K that maps KB K to an enriched KB K with K = M(K). M is an ordered list of atomic enrichment functions m ∈ M M = φ if K = K , (m1, . . . , mn), where mi ∈ M, 1 ≤ i ≤ n otherwise. Research questions 1 How to create self-configuring atomic enrichment functions m ∈ M? 2 How to automatically generate an enrichment pipeline M? Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26
  11. 11. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 8/26
  12. 12. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
  13. 13. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
  14. 14. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:commentrdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
  15. 15. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-specified set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:commentrdfs:comment rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26
  16. 16. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
  17. 17. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
  18. 18. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
  19. 19. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26
  20. 20. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
  21. 21. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
  22. 22. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
  23. 23. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26
  24. 24. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
  25. 25. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
  26. 26. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
  27. 27. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26
  28. 28. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
  29. 29. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
  30. 30. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26
  31. 31. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-specified predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
  32. 32. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-specified predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
  33. 33. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-specified predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo:relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26
  34. 34. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
  35. 35. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
  36. 36. Motivation Approach Evaluation Conclusion and Future Work Self-Configuration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo:relatedCompany owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26
  37. 37. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Refinement Operator Input Set of atomic enrichment functions M Set of positive examples E Refinement Operator ρ(M) = ∀m∈M M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
  38. 38. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Refinement Operator Input Set of atomic enrichment functions M Set of positive examples E Refinement Operator ρ(M) = ∀m∈M M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
  39. 39. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Refinement Operator Input Set of atomic enrichment functions M Set of positive examples E Refinement Operator ρ(M) = ∀m∈M M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26
  40. 40. Motivation Approach Evaluation Conclusion and Future Work Positive Example :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Enriched CBD of Ibuprofen Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 17/26
  41. 41. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  42. 42. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  43. 43. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  44. 44. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  45. 45. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  46. 46. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  47. 47. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  48. 48. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = ⊥ 2 Self-configure all mi ∈ M, add as child to ⊥ 3 Select most promising node 4 Expand most promising node ⊥ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) (m3, m2, m1) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26
  49. 49. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the node’s children count and level Node fitness f (n) Difference between node’s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth-first search (ω > 0). Most promising node The leaf node with the maximum fitness through the whole refinement tree Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
  50. 50. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the node’s children count and level Node fitness f (n) Difference between node’s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth-first search (ω > 0). Most promising node The leaf node with the maximum fitness through the whole refinement tree Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
  51. 51. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the node’s children count and level Node fitness f (n) Difference between node’s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth-first search (ω > 0). Most promising node The leaf node with the maximum fitness through the whole refinement tree Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26
  52. 52. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 20/26
  53. 53. Motivation Approach Evaluation Conclusion and Future Work Experimental Setup Datasets 1 manual experimental enrichment pipelines for Jamendo 2 manual experimental enrichment pipelines for DrugBank 5 manual experimental enrichment pipelines for DBpedia (AdministrativeRegion) Learning Algorithm 6 atomic enrichment functions Termination criterion: Maximum number of iterations of 10 Optimal enrichment pipeline found (F-score = 1) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26
  54. 54. Motivation Approach Evaluation Conclusion and Future Work Experimental Setup Datasets 1 manual experimental enrichment pipelines for Jamendo 2 manual experimental enrichment pipelines for DrugBank 5 manual experimental enrichment pipelines for DBpedia (AdministrativeRegion) Learning Algorithm 6 atomic enrichment functions Termination criterion: Maximum number of iterations of 10 Optimal enrichment pipeline found (F-score = 1) Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26
  55. 55. Motivation Approach Evaluation Conclusion and Future Work Configuration of the Search Strategy Node fitness f (n) = F(n) − ω.c(n) ω controls the tradeoff between Greedy search (ω = 0) Search strategies closer to breadth first search (ω > 0). Result: ω = 0.75 leads to the best results ω P R F 0 1.0 0.99 0.99 0.25 1.0 0.99 0.99 0.50 1.0 0.99 0.99 0.75 1.0 1.0 1.0 1.0 1.0 0.99 0.99 Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 22/26
  56. 56. Motivation Approach Evaluation Conclusion and Future Work Effect of Positive Examples Manual Examples Size of Time Size of Time Learn Iterations M count M M(KB) learned M M (KB) Time count F-score M1 DBpedia 1 1 0.2 1 1.6 1.3 1 1.0 2 1 0.2 1 1.8 1.3 1 1.0 M2 DBpedia 1 2 23.3 1 0.1 0.2 1 0.99 2 2 15 2 17 0.3 9 0.99 M3 DBpedia 1 3 14.7 3 15.2 6.1 9 0.99 2 3 15 2 15.1 0.1 9 0.99 M4 DBpedia 1 4 0.4 2 0.1 0.7 2 0.99 2 4 0.6 2 0.3 0.9 2 0.99 M5 DBpedia 1 5 22 2 0.1 0.7 2 1.0 2 5 25.5 2 0.2 0.9 2 1.0 M1 DrugBank 1 2 3.5 1 4.1 0.1 10 0.99 2 2 3.6 1 3.4 0.1 10 0.99 M2 DrugBank 1 3 25.2 1 0.1 0.1 10 0.99 2 3 22.8 1 0.1 0.1 10 0.99 M1 Jamendo 1 1 10.9 2 10.6 0.1 2 0.99 2 1 10.4 2 10.4 0.1 1 0.99 Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 23/26
  57. 57. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 24/26
  58. 58. Motivation Approach Evaluation Conclusion and Future Work Conclusion and Future Work Conclusion Presented self-configuring atomic enrichment functions Presented an approach for learning enrichment pipelines based on a refinement operator Showed that our approach can easily reconstruct manually created enrichment pipelines Future Work Parallelize the algorithm on several CPUs as well as load balancing Support directed acyclic graphs as enrichment specifications by allowing to split and merge datasets Pro-active enrichment strategies and active learning Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26
  59. 59. Motivation Approach Evaluation Conclusion and Future Work Conclusion and Future Work Conclusion Presented self-configuring atomic enrichment functions Presented an approach for learning enrichment pipelines based on a refinement operator Showed that our approach can easily reconstruct manually created enrichment pipelines Future Work Parallelize the algorithm on several CPUs as well as load balancing Support directed acyclic graphs as enrichment specifications by allowing to split and merge datasets Pro-active enrichment strategies and active learning Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26
  60. 60. Motivation Approach Evaluation Conclusion and Future Work Thank You! Questions? Mohamed Sherif Augustusplatz 10 D-04109 Leipzig sherif@informatik.uni-leipzig.de http://aksw.org/MohamedSherif http://aksw.org/Projects/DEER #akswgroup Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 26/26

×