0
Session 28:  Distributed Data Mining Research using Grids and Web Services Author/Presenter: Peter Brezany University of V...
Motivation Balatonfüred,Hungary - 6th-18th July 2008 Business Medicine Scientific experiments Simulations Earth observatio...
Outline <ul><li>Motivation </li></ul><ul><li>Selected projects  ← </li></ul><ul><li>Data mining model </li></ul><ul><li>To...
Selected Projects Balatonfüred,Hungary - 6th-18th July 2008
A Long-Term Biodiversity, Ecosystem and Awareness Research Network – ALTER-Net Balatonfüred,Hungary - 6th-18th July 2008 C...
China-Austria Data Grid (CADGrid) <ul><li>Main Idea:  Medical Meridian Measurement Grid (M 3 G) for On-Line Diagnosis  </l...
Motivation <ul><li>Meridian-Theory is an important part of Traditional Chinese Medicine (TCM) </li></ul><ul><li>Clinical p...
Meridian-Theory Basics (1) <ul><li>According to TCM our human body has 14 acupuncture meridians </li></ul><ul><li>Secret t...
Meridian-Theory Basics (2) <ul><li>Using data mining techniques, correlations between these points can be identified e.g. ...
Meridian-theory  Basics (3) <ul><li>Meridians can transport  </li></ul><ul><ul><li>physical, medical, biological material ...
Meridian Measurement Methods 1 Active    2 Passive
Active Measurement Up-flow point: lower electrical potential Down-flow point: higher electrical potential Fingers and toes...
Passive Measurement Data-file Measure Record to the ground of the instrument
<ul><li>Application 1 </li></ul><ul><li>Non-invasive Glucose  Measurement (NIGM) </li></ul><ul><li>Meridian Measurement In...
The First Prototype Balatonfüred,Hungary - 6th-18th July 2008
Balatonfüred,Hungary - 6th-18th July 2008 NIGM Workflow
M 3 G Services for Diabetics NIGM-Service – Model Setup Balatonfüred,Hungary - 6th-18th July 2008
M 3 G Services for Diabetics NIGM-Service – Use Model Balatonfüred,Hungary - 6th-18th July 2008
M 3 G Services for Diabetics NIGM-Service – Maintain Model Balatonfüred,Hungary - 6th-18th July 2008
Balatonfüred,Hungary - 6th-18th July 2008 CADGrid Framework
Intelligence  Base
Future Work <ul><li>Extension to other domains </li></ul><ul><ul><li>Brain Informatics domain </li></ul></ul>
Balatonfüred,Hungary - 6th-18th July 2008 Business Understanding Data Understanding Data Preparation Modelling Evaluation ...
Towards High Productivity Analytics Balatonfüred,Hungary - 6th-18th July 2008 A Project Sponsored by                      ...
High Productivity Analytics <ul><li>Our definition:  </li></ul><ul><li>„ A  high productive analytics system  is one that ...
High Productivity Analytics  Research Agenda <ul><li>High performance services developed by high productivity languages an...
GridMiner Data Mining Model Balatonfüred,Hungary - 6th-18th July 2008 Data Business  understanding Data understanding Data...
GridMiner Conceptual Architecture Balatonfüred,Hungary - 6th-18th July 2008 Data and functional resources can be geogra- p...
Motivation for large-scale data mining Balatonfüred,Hungary - 6th-18th July 2008 accuracy sampled data size 100% available...
Service Parallelism Levels Balatonfüred,Hungary - 6th-18th July 2008 Inter-Service & Intra-Service Parallelism
Hybrid Programming Model <ul><li>SPMD – Single Program Multiple Data (used for programming multiprocessor architectures) <...
1. Construction of Decision Trees -   SPRINT  –  S calable  P a R allelizable   IN duction of decision   T ree Balatonfüre...
Phase 1 - Preparation Balatonfüred,Hungary - 6th-18th July 2008
Phase 2 - Execution Balatonfüred,Hungary - 6th-18th July 2008
2. Construction of Neural Networks Balatonfüred,Hungary - 6th-18th July 2008 Node Error Back-Propagation Input layer Outpu...
Parallel Algorithm <ul><li>Challenges </li></ul><ul><ul><li>Training real NN is extremely computationally intensive. </li>...
Programming Environment: Titanium <ul><li>The goals: performance, safety, and expressiveness. </li></ul><ul><li>A language...
Overview of Distributed Solution Balatonfüred,Hungary - 6th-18th July 2008 Master Sub-master 0 Sub-master 1 Slave0 Slave1 ...
The Parallel Implementation Balatonfüred,Hungary - 6th-18th July 2008 VGE Client VGE Server VGE – Vienna Grid Environment
The Distributed & Parallel  Implementation Balatonfüred,Hungary - 6th-18th July 2008 VGE Client VGE Server
3. On-Line Analytical Processing (OLAP) Balatonfüred,Hungary - 6th-18th July 2008 a three-dimensional data cube
Distributed OLAP – Aggregation of Compute and Storage Resources Balatonfüred,Hungary - 6th-18th July 2008 Tuple Stream
OLAP Service Balatonfüred,Hungary - 6th-18th July 2008 Virtual Cube Sub Cube Sub Cube Slave 1 Slave 3 Master Data Sub Cube...
Workflow Composition Approaches Balatonfüred,Hungary - 6th-18th July 2008 Analytical Services Analytical Services Workflow...
GridMiner Workflow Composition Editor
Current Grids Next-Generation Grid Evolution of the Web Knowledge Technologies Evolution of HPCN Mobile Services Towards N...
Upcoming SlideShare
Loading in...5
×

Distributed Data Mining Research using Grids and Web - Iceage ...

409

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
409
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008 Advances in technology are making massive data sets common in many scientific disciplines , such as astronomy, medical imaging, bio-informatics, remote sensing, and physics. To find useful information in these datasets,
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008 Meridian line on the ear for the Small intestine .
  • Frequenz area is reduced
  • Simple grafic
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008 Spikes
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008 SPMD (Single Program, Multiple Data) is the dominant style of parallel computing . All processors use the same program, but each has its own data.
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008 1 Slave = 1 CPU
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • 10/05/10 ISSGC&apos;08: Balatonfüred,Hungary - 6th-18th July 2008
  • Transcript of "Distributed Data Mining Research using Grids and Web - Iceage ..."

    1. 1. Session 28: Distributed Data Mining Research using Grids and Web Services Author/Presenter: Peter Brezany University of Vienna, Austria 11 July
    2. 2. Motivation Balatonfüred,Hungary - 6th-18th July 2008 Business Medicine Scientific experiments Simulations Earth observations Data and data exploration cloud
    3. 3. Outline <ul><li>Motivation </li></ul><ul><li>Selected projects ← </li></ul><ul><li>Data mining model </li></ul><ul><li>Towards high productivity analytics </li></ul><ul><li>Parallel and distributed data mining and OLAP in GridMiner/ADMIRE projects </li></ul><ul><li>Future developments </li></ul>
    4. 4. Selected Projects Balatonfüred,Hungary - 6th-18th July 2008
    5. 5. A Long-Term Biodiversity, Ecosystem and Awareness Research Network – ALTER-Net Balatonfüred,Hungary - 6th-18th July 2008 Common Ontology Author: Kathi Schleidt Waste Air Soil Water E m mision Bio- diversity Forests Distributed Data Distributed Data M ining Flow Analysis Geo- Statistic Reporting Popular Presen- tation Prediction Models Distributed Applications … Statistic
    6. 6. China-Austria Data Grid (CADGrid) <ul><li>Main Idea: Medical Meridian Measurement Grid (M 3 G) for On-Line Diagnosis </li></ul><ul><li>Diabetic domain is the first domain highly profiting of the project results </li></ul>Balatonfüred,Hungary - 6th-18th July 2008
    7. 7. Motivation <ul><li>Meridian-Theory is an important part of Traditional Chinese Medicine (TCM) </li></ul><ul><li>Clinical practices of TCM (esp. acupuncture) have been guided by meridian theory for thousands of years </li></ul><ul><li>More than 4000 years of experience </li></ul><ul><li>Knowledge that we should not only use but also support by modern high-tech measurement and IT technologies </li></ul>3-Dec-07 CADGrid
    8. 8. Meridian-Theory Basics (1) <ul><li>According to TCM our human body has 14 acupuncture meridians </li></ul><ul><li>Secret to our biological and medical knowledge </li></ul><ul><li>Each meridian has its main points  called source points </li></ul>Balatonfüred,Hungary - 6th-18th July 2008
    9. 9. Meridian-Theory Basics (2) <ul><li>Using data mining techniques, correlations between these points can be identified e.g. </li></ul><ul><ul><li>start-end point correlation </li></ul></ul><ul><ul><li>symmetric point correlation </li></ul></ul><ul><li>If there was a pain on one place along the meridian, a good effect can be achieved by treating another place on the same line </li></ul>Balatonfüred,Hungary - 6th-18th July 2008
    10. 10. Meridian-theory Basics (3) <ul><li>Meridians can transport </li></ul><ul><ul><li>physical, medical, biological material and information </li></ul></ul><ul><li>The characteristics (weaker or stronger output, time delay, …) gained by the analysis of electro-signals sensed from meridians have a strong relationship with the human body organs (heart, lung, brain,…) </li></ul>
    11. 11. Meridian Measurement Methods 1 Active 2 Passive
    12. 12. Active Measurement Up-flow point: lower electrical potential Down-flow point: higher electrical potential Fingers and toes: zero potential Data-file Down-flow-key-point Up-flow-key-point Human-body-meridian Send Measure Record Record
    13. 13. Passive Measurement Data-file Measure Record to the ground of the instrument
    14. 14. <ul><li>Application 1 </li></ul><ul><li>Non-invasive Glucose Measurement (NIGM) </li></ul><ul><li>Meridian Measurement Instrument </li></ul>Balatonfüred,Hungary - 6th-18th July 2008
    15. 15. The First Prototype Balatonfüred,Hungary - 6th-18th July 2008
    16. 16. Balatonfüred,Hungary - 6th-18th July 2008 NIGM Workflow
    17. 17. M 3 G Services for Diabetics NIGM-Service – Model Setup Balatonfüred,Hungary - 6th-18th July 2008
    18. 18. M 3 G Services for Diabetics NIGM-Service – Use Model Balatonfüred,Hungary - 6th-18th July 2008
    19. 19. M 3 G Services for Diabetics NIGM-Service – Maintain Model Balatonfüred,Hungary - 6th-18th July 2008
    20. 20. Balatonfüred,Hungary - 6th-18th July 2008 CADGrid Framework
    21. 21. Intelligence Base
    22. 22. Future Work <ul><li>Extension to other domains </li></ul><ul><ul><li>Brain Informatics domain </li></ul></ul>
    23. 23. Balatonfüred,Hungary - 6th-18th July 2008 Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment Data CRISP-DM
    24. 24. Towards High Productivity Analytics Balatonfüred,Hungary - 6th-18th July 2008 A Project Sponsored by                                                                                                                                                                                                                                                                                                 Motivation:
    25. 25. High Productivity Analytics <ul><li>Our definition: </li></ul><ul><li>„ A high productive analytics system is one that delivers a high level of performance, guarantees a high level of accuracy of analytics models and other results extracted from analyzed data sets while scoring equally on other aspects, like usability, robustness, system management, and ease of programming.“ </li></ul>Balatonfüred,Hungary - 6th-18th July 2008
    26. 26. High Productivity Analytics Research Agenda <ul><li>High performance services developed by high productivity languages and tools </li></ul><ul><li>Efficient workflow management (building and execution) </li></ul><ul><li>Advanced GUI </li></ul><ul><li>Illustration on the GridMiner system </li></ul>Balatonfüred,Hungary - 6th-18th July 2008
    27. 27. GridMiner Data Mining Model Balatonfüred,Hungary - 6th-18th July 2008 Data Business understanding Data understanding Data Preparation Modeling Evaluation Deployment CRISP-DM, SPSS <ul><ul><ul><ul><ul><li>Service </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Provider </li></ul></ul></ul></ul></ul>Service Provider Service Provider Data provider GridMiner User Virtual Organization
    28. 28. GridMiner Conceptual Architecture Balatonfüred,Hungary - 6th-18th July 2008 Data and functional resources can be geogra- phically distributed – focus on virtualization and large-scale data mining. Data Warehouse Knowledge Cleaning and Integration Selection and Transformation Data Mining Evaluation and Presentation OLAP Online Analytical Mining OLAP Queries
    29. 29. Motivation for large-scale data mining Balatonfüred,Hungary - 6th-18th July 2008 accuracy sampled data size 100% available data size q i - data quality m i - modeling method (q o ,m o ) (q o ,m o ) (q 0 ,m 0 ) Assumed (q o ,m o ) (q i ,m i )
    30. 30. Service Parallelism Levels Balatonfüred,Hungary - 6th-18th July 2008 Inter-Service & Intra-Service Parallelism
    31. 31. Hybrid Programming Model <ul><li>SPMD – Single Program Multiple Data (used for programming multiprocessor architectures) </li></ul><ul><li>+ </li></ul><ul><li>SSMD – Single Service Multiple Data (introduced by us for programming service-oriented architectures) </li></ul>Balatonfüred,Hungary - 6th-18th July 2008
    32. 32. 1. Construction of Decision Trees - SPRINT – S calable P a R allelizable IN duction of decision T ree Balatonfüred,Hungary - 6th-18th July 2008 categorical continuous class Splitting Attributes The splitting attribute at a node is determined by the Gini index. Out-of-Core Algorithm
    33. 33. Phase 1 - Preparation Balatonfüred,Hungary - 6th-18th July 2008
    34. 34. Phase 2 - Execution Balatonfüred,Hungary - 6th-18th July 2008
    35. 35. 2. Construction of Neural Networks Balatonfüred,Hungary - 6th-18th July 2008 Node Error Back-Propagation Input layer Output layer Hidden layer + - Desired output Σ Sum Limiter -sigmoid funct. Weighted inputs Outputs
    36. 36. Parallel Algorithm <ul><li>Challenges </li></ul><ul><ul><li>Training real NN is extremely computationally intensive. </li></ul></ul><ul><ul><li>Many NN practical applications (e.g., speech and face recognition) involve the large number of adjustable parameters and training patterns to achieve the needed accuracy. </li></ul></ul><ul><li>Solution </li></ul><ul><ul><li>Parallel training algorithms </li></ul></ul><ul><ul><li>Development of services running in high performance hardware and software environments </li></ul></ul>Balatonfüred,Hungary - 6th-18th July 2008
    37. 37. Programming Environment: Titanium <ul><li>The goals: performance, safety, and expressiveness. </li></ul><ul><li>A language that gives its users access to modern program structuring through the use of object-oriented technology, that enables its users to write explicitly parallel code. </li></ul><ul><li>Based on a parallel SPMD model of computation with a global address space. </li></ul><ul><li>Titanium uses Java as its base, not a strict extension of Java. </li></ul><ul><li>Compiler: Titanium  C + communication </li></ul>Balatonfüred,Hungary - 6th-18th July 2008
    38. 38. Overview of Distributed Solution Balatonfüred,Hungary - 6th-18th July 2008 Master Sub-master 0 Sub-master 1 Slave0 Slave1 Slave2 Slave0 Slave1 Training Data for Sub-master 1 Data Distribution Scheme 1 Data Distribution Scheme 2 Training Data for Sub-master 0
    39. 39. The Parallel Implementation Balatonfüred,Hungary - 6th-18th July 2008 VGE Client VGE Server VGE – Vienna Grid Environment
    40. 40. The Distributed & Parallel Implementation Balatonfüred,Hungary - 6th-18th July 2008 VGE Client VGE Server
    41. 41. 3. On-Line Analytical Processing (OLAP) Balatonfüred,Hungary - 6th-18th July 2008 a three-dimensional data cube
    42. 42. Distributed OLAP – Aggregation of Compute and Storage Resources Balatonfüred,Hungary - 6th-18th July 2008 Tuple Stream
    43. 43. OLAP Service Balatonfüred,Hungary - 6th-18th July 2008 Virtual Cube Sub Cube Sub Cube Slave 1 Slave 3 Master Data Sub Cube Slave 2 Indexes Index Service query answer XML
    44. 44. Workflow Composition Approaches Balatonfüred,Hungary - 6th-18th July 2008 Analytical Services Analytical Services Workflow Engine Analytical Services Workflow Description Manual Composition Workflow Editor Analytical Services Analytical Services Workflow Composer Passive Approach Workflow Engine Analytical Services Workflow Description KB Automated Composition Reasoner Resources Monitoring Active Approach Workflow Composer Workflow Engine KB Analytical Services Analytical Services Analytical Services Reasoner Resources Monitoring
    45. 45. GridMiner Workflow Composition Editor
    46. 46. Current Grids Next-Generation Grid Evolution of the Web Knowledge Technologies Evolution of HPCN Mobile Services Towards Next-Generation Grids Computational Grid Data Grid Data Minig Grid Semantic Grid – 1st Generation
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×