Semantic-Driven Design and Management of KDD Processes

296 views

Published on

Full paper: http://boole.diiga.univpm.it/paper/cts10.pdf

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
296
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Semantic-Driven Design and Management of KDD Processes

  1. 1. Università Politecnica delle Marche Dipartimento di Ingegneria Informatica, Gestionale e dellAutomazione Ancona, Italy Semantic-Driven Design and Management of KDD Processes Emanuele Storti storti@diiga.univpm.itCTS 2010, Chicago, May 19
  2. 2. Introduction Organizations need methods and technologies to analyze huge amounts of data, to support decisional processes Knowledge Discovery in Databases (KDD) is the non- trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in dataCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  3. 3. Introduction Process  iteration, many steps Knowledge  user interaction Team work  virtual organizationsCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  4. 4. Introduction Process  iteration, many steps Knowledge  user interaction Team work  virtual organizationsCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  5. 5. Introduction Process  iteration, many steps Knowledge  user interaction Team work  virtual organizationsCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  6. 6. Introduction Process  iteration, many steps Knowledge  user interaction Team work  virtual organizations DM expert KDD in a Collaborative Distributed Scenario domain experts KDD expert Examples: KD for enterprises e-Science workflows DBACTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  7. 7. Major issues Some general questions: How to provide support for process design? How to manage execution and interactions? Distribution of users and tools: How to locate the needed tools? localization How to manage coordination? coordination Many KDD tools are available for each phase/task: How to set-up/execute the tools? heterogeneity How to compose them? integration How to support novice users? complexityCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  8. 8. Approach (i) Service-oriented platform for sharing, discovering, accessing, executing data analysis and knowledge discovery tools KDD tools produced by different organizations are remotely accessible as basic services through standard protocols Formalization of experts knowledge in a conceptual semantic model, to support advanced services (process composition) KDDONTO: an ontology for describing algorithms, interfaces, data structures, methods, tasks: sharing of knowledge / agreement on definitions: each actor can refer to the same definition of an algorithm or data human/machine understandable (conceptual/formal model) automatic reasoning support for non-expert usersCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  9. 9. Approach (ii) Separation of information in different layers (reusability): Algorithm, described into the ontology Service, implements a specific algorithm its descriptor points to the corresponding ontological concept Algorithm is-a ClassificationAlgorithm is-a ID3_v.2.3 DecisionTreeAlgorithm service is-a Service + descriptor ID3 KDDONTO fragmentCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  10. 10. Process composition COMPOSITION goal, composer KDDONTO dataset requirements Abstract processCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  11. 11. Process composition Planner for semiautomatic composition of abstract KDD process 1. algorithm match: given 2 algorithms, are they compatible? (based on ontology properties - exact vs. approximate match) y is equal to y X2 is part_of XCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  12. 12. Process composition 2. goal-oriented composition procedure: iterative execution of algorithm match Input: goal, dataset, some constraints Execution: backwards, from goal to dataset Output: a ranked list of valid abstract processes KDDComposer PrototypeCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  13. 13. Translation to concrete process COMPOSITION composer KDDONTO goal, dataset requirements Abstract process TRANSLATION syntactic verification broker UDDI Concrete processCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  14. 14. Verification and Execution Collaborative/distributed scenario: complex interactions among actors and time-consuming transactions. It is needed to provide guarantees about process Reo, a “glue code” for explicitly correctness at design-time modeling interaction among components (tools, GUI, ...) 1 Specification of the interaction protocol 3 Specs verification 2 Interaction designCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
  15. 15. Verification and Execution COMPOSITION composer KDDONTO goal, dataset requirements Abstract process TRANSLATION syntactic verification broker UDDI Concrete process VERIFICATION EXECUTION REO model exec modeler checkingCTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy

×