Ontology-driven KDD Process Composition

423 views

Published on

Full paper: http://boole.diiga.univpm.it/paper/ida09.pdf

One of the most interesting challenges in Knowledge Discovery in Databases (KDD) eld is giving support to users in the composition of tools for forming a valid and useful KDD process. Such an activity implies that users have both to choose tools suitable to their knowledge discovery problem, and to compose them for designing the KDD process.
To this end, they need expertise and knowledge about functionalities and
properties of all KDD algorithms implemented in available tools. In order to support users in this heavy activity, in this paper we introduce
a goal-driven procedure for automatically compose algorithms. The proposed procedure is based on the exploitation of KDDONTO, an ontology formalizing the domain of KDD algorithms, allowing us to generate valid and non-trivial processes.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
423
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ontology-driven KDD Process Composition

  1. 1. UNIVERSITA’ POLITECNICA DELLE MARCHE DIIGA – Dipartimento di Ingegneria Informatica, Gestionale e dell’Automazione Ancona, Italy Ontology-Driven KDD Process Composition Claudia Diamantini, Domenico Potena, Emanuele Storti {diamantini, potena, storti}@diiga.univpm.it www.diiga.univpm.itIDA09, Lyon, Aug 31
  2. 2. Introduction Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Fayyad et al., 1996] Many sources of complexity:  iterative/interactive process  many tasks and phases  several algorithms available for each phase, with specific:  characteristics, interfaces  preconditions/postconditions  performancesIDA09, Lyon, Aug 31 Emanuele Storti
  3. 3. Introduction Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Fayyad et al., 1996] Many sources of complexity:  iterative/interactive process  many tasks and phases  several algorithms available for each phase, with specific:  characteristics, interfaces  preconditions/postconditions  performancesNeed of systems for supporting users in composing algorithm for producing validand useful KDD processesIDA09, Lyon, Aug 31 Emanuele Storti
  4. 4. Aim of the work Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedureIDA09, Lyon, Aug 31 Emanuele Storti
  5. 5. Aim of the work Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relationsIDA09, Lyon, Aug 31 Emanuele Storti
  6. 6. Aim of the work Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relations Defining techniques for matching algorithms with compatible interfacesIDA09, Lyon, Aug 31 Emanuele Storti
  7. 7. Aim of the work Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relations Defining techniques for matching algorithms with compatible interfaces Defining a goal-oriented composition procedure which starts from user requests and produces a list of valid processes ranked according to some criteriaIDA09, Lyon, Aug 31 Emanuele Storti
  8. 8. Aim of the work Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relations Defining techniques for matching algorithms with compatible interfaces Defining a goal-oriented composition procedure goal which starts from user requests and produces a list dataset of valid processes ranked according to some criteria constraintsIDA09, Lyon, Aug 31 Emanuele Storti
  9. 9. Aim of the work Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relations Defining techniques for matching algorithms with compatible interfaces Defining a goal-oriented composition procedure goal which starts from user requests and produces a list dataset processes of valid processes ranked according to some criteria constraintsIDA09, Lyon, Aug 31 Emanuele Storti
  10. 10. Framework KDDVM project: service-oriented system for sharing, discovering, accessing, executing Data Mining and KDD tools Separation of information in 3 logical layer: KDD Algorithm abstract algorithm KDD Tool specific implementation of an algorithm KDD Service tool running on a specific machineAlgorithm level  output = prototype KDD processesIDA09, Lyon, Aug 31 Emanuele Storti
  11. 11. Framework KDDVM project: service-oriented system for sharing, discovering, accessing, executing Data Mining and KDD tools Separation of information in 3 logical layer: KDD Algorithm abstract algorithm KDD Tool specific implementation of an algorithm KDD Service tool running on a specific machineAlgorithm level  output = prototype KDD processesIDA09, Lyon, Aug 31 Emanuele Storti
  12. 12. KDD Ontology (1) KDDONTO is an ontology formalizing the domain of KDD algorithms:  developed following a formal methodology [Noy, 2002] (concept definition  logic modeling  translation in OWL  evaluation)  taking into account quality requirements [Gruber, 1995] Main classes and relations:  Algorithm, Method  Task, Phase  Data, DataFeature  Performance  has_input/has_output  ...IDA09, Lyon, Aug 31 Emanuele Storti
  13. 13. KDD Ontology (2) KDDONTO is coinceived for supporting process composition  Properties useful for representing algorithms interfaces:  has_condition  pre/postcondition for some input/output data  in_module/out_module suggestions about composable algorithms  not_with/not_before  explicit incompatibilities between methods  Properties useful for representing relations among data:  part_of/has_part  relations between a compound datum and its subcomponents  in_constrast  explicit incompatibilities between conditionsIDA09, Lyon, Aug 31 Emanuele Storti
  14. 14. Algorithm Matchmaking Linking algorithms with compatible interfacesExact Match Approximate MatchInterfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTOmatchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B):IDA09, Lyon, Aug 31 Emanuele Storti
  15. 15. Algorithm Matchmaking Linking algorithms with compatible interfacesExact Match Approximate MatchInterfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTOmatchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1in1 ≡o outA1 BIDA09, Lyon, Aug 31 Emanuele Storti
  16. 16. Algorithm Matchmaking Linking algorithms with compatible interfacesExact Match Approximate MatchInterfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTOmatchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1 1 2 2in ≡o outA1 inB ≡o outA1 BIDA09, Lyon, Aug 31 Emanuele Storti
  17. 17. Algorithm Matchmaking Linking algorithms with compatible interfacesExact Match Approximate MatchInterfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTOmatchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1 1 2 2 3 1in ≡o outA1 inB ≡o outA1 inB ≡o outA2 BIDA09, Lyon, Aug 31 Emanuele Storti
  18. 18. Algorithm Matchmaking Linking algorithms with compatible interfacesExact Match Approximate MatchInterfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTOmatchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1 1 2 2 3 1in ≡o outA1 inB ≡o outA1 inB ≡o outA2 VQ part_of LVQ B A1 BIDA09, Lyon, Aug 31 Emanuele Storti
  19. 19. Algorithm Matchmaking Linking algorithms with compatible interfacesExact Match Approximate MatchInterfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTOmatchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1 1 2 2 3 1in ≡o outA1 inB ≡o outA1 inB ≡o outA2 VQ part_of LVQ B A1 DATASET ≡o DATASETA2 B BIDA09, Lyon, Aug 31 Emanuele Storti
  20. 20. Composition Procedure (1) Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities  produces a subset of all possible valid processesThree phases:I. Definition of dataset , goal and user constraintsIDA09, Lyon, Aug 31 Emanuele Storti
  21. 21. Composition Procedure (1) Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities  produces a subset of all possible valid processesThree phases:I. Definition of dataset , goal and user constraintsA Dataset type and set ofinstances of DataFeatureclasse.g.: LabeledDataset{float, balanced,normalized,missing_values}IDA09, Lyon, Aug 31 Emanuele Storti
  22. 22. Composition Procedure (1) Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities  produces a subset of all possible valid processesThree phases:I. Definition of dataset , goal and user constraintsA Dataset type and set of An instance of Task classinstances of DataFeature e.g.: CLASSIFICATIONclasse.g.: LabeledDataset{float, balanced,normalized,missing_values}IDA09, Lyon, Aug 31 Emanuele Storti
  23. 23. Composition Procedure (1) Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities  produces a subset of all possible valid processesThree phases:I. Definition of dataset , goal and user constraintsA Dataset type and set of An instance of Task classinstances of DataFeature e.g.: CLASSIFICATIONclasse.g.: LabeledDataset Pruning Criteria{float, balanced, • max number of algorithms in a process;normalized, • max cost of a process;missing_values} • max computational complexityIDA09, Lyon, Aug 31 Emanuele Storti
  24. 24. Composition Procedure (2)II. Process buildingStarts from task and goes backwards iteratively A iteration, algorithmsare added to processes taskby exploiting matching dsfunctionalitiesStop conditions: - no process can be further expanded - some process constrains are violatedOutput: only valid processes: - satisfying the user goal (task) - compatible with the given datasetIDA09, Lyon, Aug 31 Emanuele Storti
  25. 25. Composition Procedure (2)II. Process buildingStarts from task and goes backwards iteratively A iteration, algorithmsare added to processes taskby exploiting matching dsfunctionalitiesStop conditions: - no process can be further expanded - some process constrains are violatedOutput: only valid processes: - satisfying the user goal (task) - compatible with the given datasetIDA09, Lyon, Aug 31 Emanuele Storti
  26. 26. Composition Procedure (2)II. Process buildingStarts from task and goes backwards iteratively A iteration, algorithmsare added to processes taskby exploiting matching dsfunctionalitiesStop conditions: - no process can be further expanded - some process constrains are violatedOutput: only valid processes: - satisfying the user goal (task) - compatible with the given datasetIDA09, Lyon, Aug 31 Emanuele Storti
  27. 27. Composition Procedure (2)II. Process buildingStarts from task and goes backwards iteratively A iteration, algorithmsare added to processes taskby exploiting matching dsfunctionalitiesStop conditions: - no process can be further expanded - some process constrains are violatedOutput: only valid processes: - satisfying the user goal (task) - compatible with the given datasetIDA09, Lyon, Aug 31 Emanuele Storti
  28. 28. Composition Procedure (2)II. Process buildingStarts from task and goes backwards iteratively A iteration, algorithmsare added to processes taskby exploiting matching dsfunctionalitiesStop conditions: - no process can be further expanded - some process constrains are violatedOutput: only valid processes: - satisfying the user goal (task) - compatible with the given datasetIDA09, Lyon, Aug 31 Emanuele Storti
  29. 29. Composition Procedure (2)II. Process buildingStarts from task and goes backwards iteratively A iteration, algorithmsare added to processes taskby exploiting matching dsfunctionalitiesStop conditions: - no process can be further expanded - some process constrains are violatedOutput: only valid processes: - satisfying the user goal (task) - compatible with the given datasetIDA09, Lyon, Aug 31 Emanuele Storti
  30. 30. Composition Procedure (2)II. Process buildingStarts from task and goes backwards iteratively A iteration, algorithmsare added to processes taskby exploiting matching dsfunctionalitiesStop conditions: - no process can be further expanded - some process constrains are violatedOutput: only valid processes: - satisfying the user goal (task) - compatible with the given datasetIII. Process rankingCost function takes into account: kind of match (exact / approximate),precondition relaxation, algorithm performances, ...IDA09, Lyon, Aug 31 Emanuele Storti
  31. 31. KDDComposer A prototype implementing the composition procedureExample scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features: {float, normalized, missing_values,...}Constraints: max 5 algorithms, etc.IDA09, Lyon, Aug 31 Emanuele Storti
  32. 32. KDDComposer A prototype implementing the composition procedureExample scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features: {float, normalized, missing_values,...}Constraints: max 5 algorithms, etc.IDA09, Lyon, Aug 31 Emanuele Storti
  33. 33. KDDComposer A prototype implementing the composition procedureExample scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features: {float, normalized, missing_values,...}Constraints: max 5 algorithms, etc.IDA09, Lyon, Aug 31 Emanuele Storti
  34. 34. KDDComposer A prototype implementing the composition procedureExample scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features: {float, normalized, missing_values,...}Constraints: max 5 algorithms, etc.IDA09, Lyon, Aug 31 Emanuele Storti
  35. 35. KDDComposer A prototype implementing the composition procedureExample scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features: {float, normalized, missing_values,...}Constraints: max 5 algorithms, etc.IDA09, Lyon, Aug 31 Emanuele Storti
  36. 36. KDDComposer A prototype implementing the composition procedureExample scenario:Task: CLASSIFICATIONDataset: LabeledDatasetDataset features: {float, normalized, missing_values,...}Constraints: max 5 algorithms, etc.Resultsa ranked list of many valid processesCompared to a non-ontological approach  more valid processes (inference)  less invalid processes (ontological and non-ontological pruning)IDA09, Lyon, Aug 31 Emanuele Storti
  37. 37. Conclusion Procedure for composing valid KDD processes  semantic representation of algorithms and dataAdvantages KDDONTO  resulting processes are valid supports complex pruning strategies Approximate Match more valid results (novel w.r.t other works in the Literature) Ranking according to both ontological and non-ontological criteria Prototype processes can be themselves considered as valid, unknown and useful knowledge, valuable for both novice and experts usersFuture works translating each prototype process in a concrete workflow of KDD Web ServicesIDA09, Lyon, Aug 31 Emanuele Storti
  38. 38. Project website Project website: http://boole.diiga.univpm.itIDA09, Lyon, Aug 31 Emanuele Storti
  39. 39. UNIVERSITA’ POLITECNICA DELLE MARCHE DIIGA – Dipartimento di Ingegneria Informatica, Gestionale e dell’Automazione Ancona, Italy Ontology-Driven KDD Process Composition Claudia Diamantini, Domenico Potena, Emanuele Storti {diamantini, potena, storti}@diiga.univpm.it www.diiga.univpm.itIDA09, Lyon, Aug 31

×