05/10/10 Data Mining   The Art and Science of  Obtaining Knowledge from Data Dr. Saed Sayad
Agenda <ul><li>Explosion of data </li></ul><ul><li>Introduction to data mining </li></ul><ul><li>Examples of data mining i...
Explosion of Data 05/10/10 <ul><li>Data in the world doubles every 20 months! </li></ul><ul><li>NASA’s Earth Orbiting Syst...
Explosion of Data (cont.)  05/10/10
05/10/10 Explosion of Data (cont.)
05/10/10 Explosion of Data (cont.)
05/10/10 Explosion of Data (cont.)
05/10/10 Fast, accurate, and scalable data analysis techniques to extract useful knowledge: The answer is  Data Mining . W...
What is Data Mining? 05/10/10 “ Data Mining is the exploration and analysis of large or small quantities of data in order ...
05/10/10 AI, Machine  Learning Statistics Data Mining Database Data Analysis Data Warehouse OLAP
05/10/10 Data Mining Data Analysis Database Statistics Machine Learning Data Warehouse OLAP
05/10/10 Database Text Files Relational Database Multi-dimensional Database Entities File Table Cube Attributes Row and Co...
Data Analysis 05/10/10 <ul><li>Classification  </li></ul><ul><li>Regression </li></ul><ul><li>Clustering </li></ul><ul><li...
Data Analysis 05/10/10 X 1 X 2 Y 2 Output Variables or Targets Y 1 Numeric Categorical Numeric Categorical Regression  (0,...
Data Analysis (cont.) 05/10/10 Age Income Clustering 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Probabili...
05/10/10 Data Mining in Research Life Cycle <ul><li>Questions </li></ul><ul><li>Needs </li></ul>Search Re search Experimen...
Data Mining – Modeling Steps 05/10/10 <ul><li>Problem Definition </li></ul><ul><li>Data Preparation </li></ul><ul><li>Expl...
Agenda <ul><li>Explosion of data </li></ul><ul><li>Introduction to data mining </li></ul><ul><li>Examples of data mining i...
Examples of data mining in science & engineering 05/10/10 <ul><li>1.  Data mining in Biomedical Engineering </li></ul><ul>...
05/10/10 1.  Problem Definition “ Control a robotic arm by means of EMG signals from biceps and triceps muscles.” Supinati...
2. Data Preparation 05/10/10 <ul><li>The dataset includes 80 records. </li></ul><ul><li>There are two input variables; bic...
3. Exploration 05/10/10 Triceps Record# Scatter Plot Flexion   Extension   Supination   Pronation
3. Exploration  (cont.) 05/10/10 Biceps Record# Scatter Plot Flexion   Extension   Supination   Pronation
5. Modeling 05/10/10 <ul><li>Classification </li></ul><ul><ul><li>OneR </li></ul></ul><ul><ul><li>Decision Tree </li></ul>...
6. Model Deployment 05/10/10 A neural network model was successfully implemented inside the robotic arm.
Examples of data mining in science & engineering 05/10/10 1 .  Data mining in Biomedical Engineering “ Robotic Arm Control...
Plastics Extrusion 05/10/10 Plastic pellets Plastic melt
05/10/10 Film Extrusion Extruder Plastic Film Defect due to particle contaminant
In-Line Monitoring 05/10/10 Window Ports Transition Piece
In-Line Monitoring 05/10/10 Light Source Extruder and Interface Optical Assembly Imaging Computer Light
Melt  Without  Contaminant Particles (WO) 05/10/10
Melt  With  Contaminant Particles (WP) 05/10/10
1.  Problem Definition 05/10/10 Classify images into those with particles (WP) and those without particles (WO). WO WP
2. Data Preparation 05/10/10 <ul><li>2000 Images </li></ul><ul><li>54 Input variables all numeric </li></ul><ul><li>One ou...
2. Data Preparation (cont.) 05/10/10 <ul><li>Pre-processed images to remove noise </li></ul><ul><li>Dataset 1 with sharp i...
3. Exploration 05/10/10 Demo!
4. Modeling 05/10/10 <ul><li>Classification : </li></ul><ul><ul><li>OneR </li></ul></ul><ul><ul><li>Decision Tree </li></u...
5. Evaluation 05/10/10 10 -fold cross-validation  If pixel_density_max < 142 then WP Dataset Attrib. Class One-R C4.5 3.N....
6. Deploy model 05/10/10 <ul><li>A Visual Basic program will be developed to implement the model. </li></ul>
Agenda <ul><li>Explosion of data </li></ul><ul><li>Introduction to data mining </li></ul><ul><li>Examples of data mining i...
Challenges and Opportunities 05/10/10 <ul><li>Data mining is a ‘top ten’ emerging technology. </li></ul><ul><li>High pay j...
05/10/10 Data mining  is an exciting and challenging field with the ability to solve many complex scientific and business ...
Upcoming SlideShare
Loading in …5
×

The Art and Technology of Data Mining

700 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
700
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Art and Technology of Data Mining

  1. 1. 05/10/10 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad
  2. 2. Agenda <ul><li>Explosion of data </li></ul><ul><li>Introduction to data mining </li></ul><ul><li>Examples of data mining in science and engineering </li></ul><ul><li>Challenges and opportunities </li></ul>05/10/10
  3. 3. Explosion of Data 05/10/10 <ul><li>Data in the world doubles every 20 months! </li></ul><ul><li>NASA’s Earth Orbiting System: </li></ul><ul><li>46 megabytes of data per second </li></ul><ul><li>4,000,000,000,000 bytes a day </li></ul><ul><li>FBI fingerprints image library: </li></ul><ul><li>200,000,000,000,000 bytes </li></ul><ul><li>In-line image analysis for particle detection: </li></ul><ul><ul><li>1 megabyte in one second </li></ul></ul>
  4. 4. Explosion of Data (cont.) 05/10/10
  5. 5. 05/10/10 Explosion of Data (cont.)
  6. 6. 05/10/10 Explosion of Data (cont.)
  7. 7. 05/10/10 Explosion of Data (cont.)
  8. 8. 05/10/10 Fast, accurate, and scalable data analysis techniques to extract useful knowledge: The answer is Data Mining . What we need?
  9. 9. What is Data Mining? 05/10/10 “ Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.” Data Knowledge Data Mining
  10. 10. 05/10/10 AI, Machine Learning Statistics Data Mining Database Data Analysis Data Warehouse OLAP
  11. 11. 05/10/10 Data Mining Data Analysis Database Statistics Machine Learning Data Warehouse OLAP
  12. 12. 05/10/10 Database Text Files Relational Database Multi-dimensional Database Entities File Table Cube Attributes Row and Col Record, Field, Index Dimension, Level, Measurement Methods Read, Write Select, Insert, Update, Delete Drill down, Drill up, Drill through Language - SQL MDX
  13. 13. Data Analysis 05/10/10 <ul><li>Classification </li></ul><ul><li>Regression </li></ul><ul><li>Clustering </li></ul><ul><li>Association </li></ul><ul><li>Sequence Analysis </li></ul>
  14. 14. Data Analysis 05/10/10 X 1 X 2 Y 2 Output Variables or Targets Y 1 Numeric Categorical Numeric Categorical Regression (0,1) Classification (good, bad) age, income, … gender, occupation, … Linear Models or Decision Trees Input Variables or Attributes Model W 1 W 2
  15. 15. Data Analysis (cont.) 05/10/10 Age Income Clustering 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Probability (chips, coke) ? Association Sequence Analysis … ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… X t-1 X t T
  16. 16. 05/10/10 Data Mining in Research Life Cycle <ul><li>Questions </li></ul><ul><li>Needs </li></ul>Search Re search Experiment Modeling Report Library Data Database Data Analysis
  17. 17. Data Mining – Modeling Steps 05/10/10 <ul><li>Problem Definition </li></ul><ul><li>Data Preparation </li></ul><ul><li>Exploration </li></ul><ul><li>Modeling </li></ul><ul><li>Evaluation </li></ul><ul><li>Deployment </li></ul>
  18. 18. Agenda <ul><li>Explosion of data </li></ul><ul><li>Introduction to data mining </li></ul><ul><li>Examples of data mining in science and engineering </li></ul><ul><li>Challenges and opportunities </li></ul>05/10/10
  19. 19. Examples of data mining in science & engineering 05/10/10 <ul><li>1. Data mining in Biomedical Engineering </li></ul><ul><li>“ Robotic Arm Control Using Data Mining Techniques” </li></ul><ul><li>2. Data mining in Chemical Engineering </li></ul><ul><li> “ Data Mining for In-line Image Monitoring of Extrusion Processing ” </li></ul>
  20. 20. 05/10/10 1. Problem Definition “ Control a robotic arm by means of EMG signals from biceps and triceps muscles.” Supination Pronation Flexion Extension Muscle Contraction Biceps Triceps Supination H H Pronation L L Flexion H L Extension L H
  21. 21. 2. Data Preparation 05/10/10 <ul><li>The dataset includes 80 records. </li></ul><ul><li>There are two input variables; biceps signal and triceps signal. </li></ul><ul><li>One output variable, with four possible values; Supination, Pronation, Flexion and Extension. </li></ul>
  22. 22. 3. Exploration 05/10/10 Triceps Record# Scatter Plot Flexion Extension Supination Pronation
  23. 23. 3. Exploration (cont.) 05/10/10 Biceps Record# Scatter Plot Flexion Extension Supination Pronation
  24. 24. 5. Modeling 05/10/10 <ul><li>Classification </li></ul><ul><ul><li>OneR </li></ul></ul><ul><ul><li>Decision Tree </li></ul></ul><ul><ul><li>Naïve Bayesian </li></ul></ul><ul><ul><li>K-Nearest Neighbors </li></ul></ul><ul><ul><li>Neural Networks </li></ul></ul><ul><ul><li>Linear Discriminant Analysis </li></ul></ul><ul><ul><li>Support Vector Machines </li></ul></ul><ul><ul><li>… </li></ul></ul>
  25. 25. 6. Model Deployment 05/10/10 A neural network model was successfully implemented inside the robotic arm.
  26. 26. Examples of data mining in science & engineering 05/10/10 1 . Data mining in Biomedical Engineering “ Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “ Data Mining for In-line Image Monitoring of Extrusion Processing ”
  27. 27. Plastics Extrusion 05/10/10 Plastic pellets Plastic melt
  28. 28. 05/10/10 Film Extrusion Extruder Plastic Film Defect due to particle contaminant
  29. 29. In-Line Monitoring 05/10/10 Window Ports Transition Piece
  30. 30. In-Line Monitoring 05/10/10 Light Source Extruder and Interface Optical Assembly Imaging Computer Light
  31. 31. Melt Without Contaminant Particles (WO) 05/10/10
  32. 32. Melt With Contaminant Particles (WP) 05/10/10
  33. 33. 1. Problem Definition 05/10/10 Classify images into those with particles (WP) and those without particles (WO). WO WP
  34. 34. 2. Data Preparation 05/10/10 <ul><li>2000 Images </li></ul><ul><li>54 Input variables all numeric </li></ul><ul><li>One output variables with two possible values </li></ul><ul><ul><li>With Particle </li></ul></ul><ul><ul><li>Without Particle </li></ul></ul>
  35. 35. 2. Data Preparation (cont.) 05/10/10 <ul><li>Pre-processed images to remove noise </li></ul><ul><li>Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles </li></ul><ul><li>Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles </li></ul><ul><li>54 Input variables, all numeric </li></ul><ul><li>One output variable, with two possible values (WP and WO) </li></ul>
  36. 36. 3. Exploration 05/10/10 Demo!
  37. 37. 4. Modeling 05/10/10 <ul><li>Classification : </li></ul><ul><ul><li>OneR </li></ul></ul><ul><ul><li>Decision Tree </li></ul></ul><ul><ul><li>3-Nearest Neighbors </li></ul></ul><ul><ul><li>Naïve Bayesian </li></ul></ul>
  38. 38. 5. Evaluation 05/10/10 10 -fold cross-validation If pixel_density_max < 142 then WP Dataset Attrib. Class One-R C4.5 3.N.N Bayes Sharp Images 54 2 99.9 99.8 99.8 95.8 Sharp + Blurry Images 54 2 98.5 97.8 97.8 93.3 Sharp + Blurry Images 54 3 87 87 84 79
  39. 39. 6. Deploy model 05/10/10 <ul><li>A Visual Basic program will be developed to implement the model. </li></ul>
  40. 40. Agenda <ul><li>Explosion of data </li></ul><ul><li>Introduction to data mining </li></ul><ul><li>Examples of data mining in science & engineering </li></ul><ul><li>Challenges and opportunities </li></ul>05/10/10
  41. 41. Challenges and Opportunities 05/10/10 <ul><li>Data mining is a ‘top ten’ emerging technology. </li></ul><ul><li>High pay job! in the financial, medical and engineering. </li></ul><ul><li>Faster, more accurate and more scalable techniques. </li></ul><ul><li>Incremental, on-line and real-time learning algorithms. </li></ul><ul><li>Parallel and distributed data processing techniques. </li></ul>
  42. 42. 05/10/10 Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems. You can be part of the solution!

×