Università Politecnica delle Marche                Department of Computer Science, Management and Automation              ...
Introduction  Data explosion & KDD  Organizations need methods and technologies to analyze  huge amounts of data, to suppo...
Introduction  1st generation of IDAs (Intelligent Data Analysis systems):  ● local frameworks  ● single-user  ● predefined...
Issues  Heterogeneity & tool distribution  Many KDD and Data Mining tools available for any domain/task,  many possible co...
Issues  User distribution  Distributed organizations:  ● multiple branch enterprises  ● E-Science project  Collaboration: ...
Methodology  Service Oriented Architecture  Basic Services                    Support Services  Services for any KDD task:...
Methodology  Semantic descriptors for Basic Services                     Separation of information           in    3      ...
Methodology         KDD algorithms            ID3         KDD tools     KDD services  Benefits: loose-coupling, reusabili...
Methodology  KDD ontology       Remove missing values                         C4.5                          algorithm     ...
KDDesigner  A web-based tool aimed at supporting users in collaborative  KDD process designCTS2011, May 23-27             ...
Service discovery  Retrieval of KDD services satisfying user requirementsCTS2011, May 23-27                   A Semantic-A...
Service discovery  Retrieval of KDD services satisfying user requirements                                            4    ...
Service discovery  Retrieval of KDD services satisfying user requirements                                      1          ...
Process designCTS2011, May 23-27   A Semantic-Aided Designer for KD
Interface matchmaking  Verification of data compatibility in an I/O connectionCTS2011, May 23-27                      A Se...
Interface matchmaking Matchmaker service checks the validity of the match ●     syntactic compatibility     comparison bet...
Interface matchmaking Matchmaker service checks the validity of the match ●     syntactic compatibility     comparison bet...
Semi-automatic composition KDDComposer: advanced service for composition Input ● user dataset ● a set of requirements (max...
Collaboration  ● collaborative process edit/annotation (wiki-style)  ● versioning system  ● team management and add of new...
Conclusion SOA for KDD      ●   Basic Services and Support Services      ●   KDD Designer: a semantic-aided designer for K...
Upcoming SlideShare
Loading in...5
×

A Semantic-Aided Designer for Knowledge Discovery

185

Published on

Full paper: http://boole.diiga.univpm.it/paper/cts11.pdf

Knowledge Discovery in Databases (KDD), as any scientific experimentation in e-Science, is a complex and
computationally intensive process aimed at gaining knowledge from a huge set of data. Often performed in
distributed settings, a KDD project usually involves a deep interaction between tools and several users with specific expertise, which can be either a co-located group or a geographically distributed virtual team of experts. Given the complexity of the process, such users need some support to achieve their goal of knowledge extraction. This paper introduces KDDesigner, a webbased semantic-driven tool aimed at supporting users in the collaborative design of a KDD process. In this paper we address semantic issues related to the collaborative project managment: tool localization, tool integration, interfaces matchmaking, process execution, team building and process versioning.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
185
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Semantic-Aided Designer for Knowledge Discovery

  1. 1. Università Politecnica delle Marche Department of Computer Science, Management and Automation Ancona, Italy A Semantic-Aided Designer for Knowledge Discovery Claudia Diamantini, Domenico Potena, Emanuele Storti e.storti@univpm.itCTS2011, Philadelphia, May 23-27
  2. 2. Introduction Data explosion & KDD Organizations need methods and technologies to analyze huge amounts of data, to support decisional processes Knowledge Discovery in Databases (KDD) is the process of identifying valid, novel, potentially useful patterns in data  many steps, iterations  interaction  user knowledgeCTS2011, May 23-27 A Semantic-Aided Designer for KD
  3. 3. Introduction 1st generation of IDAs (Intelligent Data Analysis systems): ● local frameworks ● single-user ● predefined set of tools (little extensibility) 2nd generation: distribution of tools & computational aspects Evolution of organizations: distribution of user, collaboration domain expert DB / DWH administrator How to support the design of a KDD project in an open, distributed and collaborative scenario? DM KDD specialists specialistsCTS2011, May 23-27 A Semantic-Aided Designer for KD
  4. 4. Issues Heterogeneity & tool distribution Many KDD and Data Mining tools available for any domain/task, many possible combinations  Heterogeneous interfaces programming languages, OSs, transfer protocols,..  Complex to use process design, data preparation, precondition satisfaction, I/O interpretation  tools should be easily and dinamically added in the platform  they should be accessible, searchable, executable via standard API  suggestions about the best tool sequences  support for tool setup and process executionCTS2011, May 23-27 A Semantic-Aided Designer for KD
  5. 5. Issues User distribution Distributed organizations: ● multiple branch enterprises ● E-Science project Collaboration: ● source of complexity ● distributed computation: several users can succeed where a single user is likely to fail  collaborative design of KDD processes  tool/process sharing and annotation  easy join of new partners in Virtual TeamsCTS2011, May 23-27 A Semantic-Aided Designer for KD
  6. 6. Methodology Service Oriented Architecture Basic Services Support Services Services for any KDD task: Back-end services: ● access control every KDD tool is wrapped ● data transfer as a Web Service, deployed ● service publishing on the publishers server, ● UDDI registry and published in a common repository High-level functionalities: ● service discovery ● interface matchmaking ● process composition C4.5 tool C4.5 serviceCTS2011, May 23-27 A Semantic-Aided Designer for KD
  7. 7. Methodology Semantic descriptors for Basic Services Separation of information in 3 abstraction layers Tools/services are annotated through XML descriptors: details about interfaces and QoS Algorithms are formally described in a KDD ontology, which contains an algorithm taxonomy and high level information about their tasks, methods and functionalitiesCTS2011, May 23-27 A Semantic-Aided Designer for KD
  8. 8. Methodology KDD algorithms ID3 KDD tools KDD services  Benefits: loose-coupling, reusability  Support services rely on such layers:  service discovery  interface matchmaking  process compositionCTS2011, May 23-27 A Semantic-Aided Designer for KD
  9. 9. Methodology KDD ontology Remove missing values C4.5 algorithm Labeled Dataset algorithm KDD services abc C4.5_v.2.0  Benefits: loose-coupling, reusability  Support services rely on such layers:  service discovery  interface matchmaking  process compositionCTS2011, May 23-27 A Semantic-Aided Designer for KD
  10. 10. KDDesigner A web-based tool aimed at supporting users in collaborative KDD process designCTS2011, May 23-27 A Semantic-Aided Designer for KD
  11. 11. Service discovery Retrieval of KDD services satisfying user requirementsCTS2011, May 23-27 A Semantic-Aided Designer for KD
  12. 12. Service discovery Retrieval of KDD services satisfying user requirements 4 1 2 3 KDDONTOCTS2011, May 23-27 A Semantic-Aided Designer for KD
  13. 13. Service discovery Retrieval of KDD services satisfying user requirements 1 4 2 3CTS2011, May 23-27 A Semantic-Aided Designer for KD
  14. 14. Process designCTS2011, May 23-27 A Semantic-Aided Designer for KD
  15. 15. Interface matchmaking Verification of data compatibility in an I/O connectionCTS2011, May 23-27 A Semantic-Aided Designer for KD
  16. 16. Interface matchmaking Matchmaker service checks the validity of the match ● syntactic compatibility comparison between service descriptors (I/O primitive datatype and syntax) same format? KDD services same primitive abc datatype? ● Output: cost of matchCTS2011, May 23-27 A Semantic-Aided Designer for KD
  17. 17. Interface matchmaking Matchmaker service checks the validity of the match ● syntactic compatibility comparison between service descriptors (I/O primitive datatype and syntax) ● semantic compatibility comparison between ontological annotations of the services (kind of match between I/O, preconditions/postconditions... and many more) same concept? KDD ontology x subconcept? y part-of concept? KDD services abc ● Output: cost of matchCTS2011, May 23-27 A Semantic-Aided Designer for KD
  18. 18. Semi-automatic composition KDDComposer: advanced service for composition Input ● user dataset ● a set of requirements (max num algorithms, computational complexity, max cost of match) ● user goal (classification, regression, ...) Output A ranked list of abstract processes (suggestions about processes useful to solve the user problem)CTS2011, May 23-27 A Semantic-Aided Designer for KD
  19. 19. Collaboration ● collaborative process edit/annotation (wiki-style) ● versioning system ● team management and add of new users ● manual parameter settingCTS2011, May 23-27 A Semantic-Aided Designer for KD
  20. 20. Conclusion SOA for KDD ● Basic Services and Support Services ● KDD Designer: a semantic-aided designer for KDD Open environment and heterogeneous tools ● different interfaces: need of a common representation (service) ● abstraction for an high-level description of tools (algorithm) ● semantics for interoperability and high-level functionalities Future work ● extension with new support services ● process export in more workflow languages ● more collaborative features (real-time editor)CTS2011, May 23-27 A Semantic-Aided Designer for KD
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×