DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of Dayton MBA 664 13 APR 09
Data Mining:  Outline <ul><li>Introduction </li></ul><ul><li>Applications / Issues </li></ul><ul><li>Products </li></ul><u...
Introduction <ul><li>Data Mining Definition </li></ul><ul><ul><li>Analysis of large amounts of digital data </li></ul></ul...
Introduction <ul><li>High Level Process </li></ul><ul><ul><li>Summarize the Data </li></ul></ul><ul><ul><li>Generate Predi...
Applications / Issues <ul><li>Applications  </li></ul><ul><ul><li>Telecommunications </li></ul></ul><ul><ul><ul><li>Cell p...
Data Mining Products <ul><li>Angoss Software ( www.angoss.com ) </li></ul><ul><ul><li>Knowledge Seeker/Studio </li></ul></...
Angoss Knowledge Studio
SAS Institute
SPSS Inc.
Data Mining Process <ul><li>No uniformly accepted practice </li></ul><ul><li>2002  www. KDnuggets .com  survey </li></ul><...
Data Mining Process <ul><li>SPSS CRISP-DM </li></ul><ul><ul><li>CRoss Industry Standard Process for Data Modeling </li></u...
Data Mining Process <ul><li>CRISP-DM </li></ul>
Data Mining Process <ul><li>SAS SEMMA </li></ul><ul><ul><li>Model development is focus </li></ul></ul><ul><ul><li>User def...
Data Mining Process <ul><li>Common Steps in Any DM Process </li></ul><ul><ul><li>1. Problem Definition </li></ul></ul><ul>...
Data Mining Techniques <ul><li>Statistical Methods (Sample Statistics, Linear Regression) </li></ul><ul><li>Nearest Neighb...
Statistical Methods <ul><li>Sample Statistics </li></ul><ul><ul><li>Quick look at the data </li></ul></ul><ul><ul><li>Ex: ...
Example: Linear Regression Customer Income
Nearest Neighbor Prediction <ul><li>Easy to understand </li></ul><ul><li>Used for predicting </li></ul><ul><li>Works best ...
Example: Nearest Neighbor Distance from Competitor Population of City  B A A A A A A A U B B B B A C C C C Product Sales b...
Neural Network <ul><li>Contains input, hidden and output layer </li></ul><ul><li>Used when there are large amounts of pred...
Example: Neural Network W 1  =.36 W 2  =.64 Population of City Product Sales Prediction Distance from Competitor 0.736
Clustering/Segmenting <ul><li>Not used for prediction </li></ul><ul><li>Forms groups that are very similar or very differe...
Example: Clustering/Segmenting < 40 years >= 40 years Red  = Female Blue   = Male Dimension A
Decision Trees <ul><li>Uses categorical variables </li></ul><ul><li>Determines what variable is causing the greatest “spli...
Example: Decision Trees F M -.63 n = 24 -.29 n = 24 -.29 n = 24 Change from original score .14 n = 115 .58 n = 67 -.46 n =...
Data Mining Example 1. Problem Definition <ul><li>Improve On-Time Delivery of New Products </li></ul>
Data Mining Example 2. Collect Data Brainstorm Variation Sources Data Collection Plan
Data Mining Example 3. Data Review <ul><li>Data Segments </li></ul>TOTAL LEAD TIME by Part Type:  p < .05 Level  N  Mean  ...
Data Mining Example 5. Build Model
Data Mining Example 5. Build Model SHIP-DUE = 7.97 + 0.269*(MODEL_CR-DUE) + 0.173*(CR-ISS) + 0.704*(MAN_BOMC) + 0.748*(SCH...
Data Mining Example 6. Model Evaluation Model Accurately Reflects Delivery Distribution
Data Mining Example 7. Document / Deploy Design Release Required for On Time Delivery Due Date
Data Mining Example 7. Document / Deploy Update Planning and Automate Tracking Requirements Plan Actual
Data Mining <ul><li>Questions? </li></ul>
Upcoming SlideShare
Loading in...5
×

Presentation

353

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
353
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Presentation

  1. 1. DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of Dayton MBA 664 13 APR 09
  2. 2. Data Mining: Outline <ul><li>Introduction </li></ul><ul><li>Applications / Issues </li></ul><ul><li>Products </li></ul><ul><li>Process </li></ul><ul><li>Techniques </li></ul><ul><li>Example </li></ul>
  3. 3. Introduction <ul><li>Data Mining Definition </li></ul><ul><ul><li>Analysis of large amounts of digital data </li></ul></ul><ul><ul><li>Identify unknown patterns, relationships </li></ul></ul><ul><ul><li>Draw conclusions AND predict future </li></ul></ul><ul><li>Data Mining Growth </li></ul><ul><ul><li>Increase in computer processing speed </li></ul></ul><ul><ul><li>Decrease in cost of data storage </li></ul></ul>
  4. 4. Introduction <ul><li>High Level Process </li></ul><ul><ul><li>Summarize the Data </li></ul></ul><ul><ul><li>Generate Predictive Model </li></ul></ul><ul><ul><li>Verify the Model </li></ul></ul><ul><li>Analyst Must Understand </li></ul><ul><ul><li>The business </li></ul></ul><ul><ul><li>Data and its origins </li></ul></ul><ul><ul><li>Analysis methods and results </li></ul></ul><ul><ul><li>Value provided </li></ul></ul>
  5. 5. Applications / Issues <ul><li>Applications </li></ul><ul><ul><li>Telecommunications </li></ul></ul><ul><ul><ul><li>Cell phone contract turnover </li></ul></ul></ul><ul><ul><li>Credit Card </li></ul></ul><ul><ul><ul><li>Fraud identification </li></ul></ul></ul><ul><ul><li>Finance </li></ul></ul><ul><ul><ul><li>Corporate performance </li></ul></ul></ul><ul><ul><li>Retail </li></ul></ul><ul><ul><ul><li>Targeting products to customers </li></ul></ul></ul><ul><li>Legal and Ethical Issues </li></ul><ul><ul><li>Aggregation of data to track individual behavior </li></ul></ul>
  6. 6. Data Mining Products <ul><li>Angoss Software ( www.angoss.com ) </li></ul><ul><ul><li>Knowledge Seeker/Studio </li></ul></ul><ul><ul><li>Strategy Builder </li></ul></ul><ul><li>Infor Global Solutions ( www.infor.com ) </li></ul><ul><ul><li>Infor CRM Epiphany </li></ul></ul><ul><li>Portrait Software ( www.portraitsoftware.com ) </li></ul><ul><li>SAS Institute ( www.sas.com ) </li></ul><ul><ul><li>SAS Enterprise Miner </li></ul></ul><ul><ul><li>SAS Analytics </li></ul></ul><ul><li>SPSS Inc ( www.spss.com ) </li></ul><ul><ul><li>Clementine </li></ul></ul>
  7. 7. Angoss Knowledge Studio
  8. 8. SAS Institute
  9. 9. SPSS Inc.
  10. 10. Data Mining Process <ul><li>No uniformly accepted practice </li></ul><ul><li>2002 www. KDnuggets .com survey </li></ul><ul><ul><li>SPSS CRISP-DM </li></ul></ul><ul><ul><li>SAS SEMMA </li></ul></ul>
  11. 11. Data Mining Process <ul><li>SPSS CRISP-DM </li></ul><ul><ul><li>CRoss Industry Standard Process for Data Modeling </li></ul></ul><ul><ul><li>Consortium: Daimler-Chrysler, SPSS, NCR </li></ul></ul><ul><ul><li>Hierarchical Process – Cyclical and Iterative </li></ul></ul>
  12. 12. Data Mining Process <ul><li>CRISP-DM </li></ul>
  13. 13. Data Mining Process <ul><li>SAS SEMMA </li></ul><ul><ul><li>Model development is focus </li></ul></ul><ul><ul><li>User defines problem, conditions data outside SEMMA </li></ul></ul><ul><ul><ul><li>S ample – portion data, statistically </li></ul></ul></ul><ul><ul><ul><li>E xplore – view, plot, subgroup </li></ul></ul></ul><ul><ul><ul><li>M odify – select, transform, update </li></ul></ul></ul><ul><ul><ul><li>M odel – fit data, any technique </li></ul></ul></ul><ul><ul><ul><li>A ssess – evaluate for usefulness </li></ul></ul></ul>
  14. 14. Data Mining Process <ul><li>Common Steps in Any DM Process </li></ul><ul><ul><li>1. Problem Definition </li></ul></ul><ul><ul><li>2. Data Collection </li></ul></ul><ul><ul><li>3. Data Review </li></ul></ul><ul><ul><li>4. Data Conditioning </li></ul></ul><ul><ul><li>5. Model Building </li></ul></ul><ul><ul><li>6. Model Evaluation </li></ul></ul><ul><ul><li>7. Documentation / Deployment </li></ul></ul>
  15. 15. Data Mining Techniques <ul><li>Statistical Methods (Sample Statistics, Linear Regression) </li></ul><ul><li>Nearest Neighbor Prediction </li></ul><ul><li>Neural Network </li></ul><ul><li>Clustering/Segmenting </li></ul><ul><li>Decision Tree </li></ul>
  16. 16. Statistical Methods <ul><li>Sample Statistics </li></ul><ul><ul><li>Quick look at the data </li></ul></ul><ul><ul><li>Ex: Minimum, Maximum, Mean, Median, Variance </li></ul></ul><ul><li>Linear Regression </li></ul><ul><ul><li>Easy and works with simple problems </li></ul></ul><ul><ul><li>May need more complex model using different method </li></ul></ul>
  17. 17. Example: Linear Regression Customer Income
  18. 18. Nearest Neighbor Prediction <ul><li>Easy to understand </li></ul><ul><li>Used for predicting </li></ul><ul><li>Works best with few predictor variables </li></ul><ul><li>Based on the idea that something will behave the same as how others “near” it behave </li></ul><ul><li>Can also show level of confidence in prediction </li></ul>
  19. 19. Example: Nearest Neighbor Distance from Competitor Population of City B A A A A A A A U B B B B A C C C C Product Sales by Population of City and Distance from Competitor A: > 200 units B: 100 – 200 units C: < 100 units
  20. 20. Neural Network <ul><li>Contains input, hidden and output layer </li></ul><ul><li>Used when there are large amounts of predictive variables </li></ul><ul><li>Model can be used again and again once confirmed successful </li></ul><ul><li>Can be hard to interpret </li></ul><ul><li>Extremely time consuming to format the data </li></ul>
  21. 21. Example: Neural Network W 1 =.36 W 2 =.64 Population of City Product Sales Prediction Distance from Competitor 0.736
  22. 22. Clustering/Segmenting <ul><li>Not used for prediction </li></ul><ul><li>Forms groups that are very similar or very different </li></ul><ul><li>Gives an overall view of the data </li></ul><ul><li>Can also be used to identify potential problems if there is an outlier </li></ul>
  23. 23. Example: Clustering/Segmenting < 40 years >= 40 years Red = Female Blue = Male Dimension A
  24. 24. Decision Trees <ul><li>Uses categorical variables </li></ul><ul><li>Determines what variable is causing the greatest “split” between the data </li></ul><ul><li>Easy to interpret </li></ul><ul><li>Not much data formatting </li></ul><ul><li>Can be used for many different situations </li></ul>
  25. 25. Example: Decision Trees F M -.63 n = 24 -.29 n = 24 -.29 n = 24 Change from original score .14 n = 115 .58 n = 67 -.46 n = 48 Baseline < 3.75 Baseline >= 3.75 M F .76 n = 51 .47 n = 28 1.11 n = 23 Large body type Small body type
  26. 26. Data Mining Example 1. Problem Definition <ul><li>Improve On-Time Delivery of New Products </li></ul>
  27. 27. Data Mining Example 2. Collect Data Brainstorm Variation Sources Data Collection Plan
  28. 28. Data Mining Example 3. Data Review <ul><li>Data Segments </li></ul>TOTAL LEAD TIME by Part Type: p < .05 Level N Mean StDev ----+---------+---------+---------+-- BRACKET 520 x6.76 x3.14 (--*-) DUCT 138 x6.70 x0.40 (----*---) MANIFOLD 44 x9.95 x4.68 (-------*-------) TUBE 47 x3.60 x2.79 (------*-------) ----+---------+---------+---------+-- Pooled StDev = 68.47
  29. 29. Data Mining Example 5. Build Model
  30. 30. Data Mining Example 5. Build Model SHIP-DUE = 7.97 + 0.269*(MODEL_CR-DUE) + 0.173*(CR-ISS) + 0.704*(MAN_BOMC) + 0.748*(SCH_ST-MAN) + 0.862*(MOS_MOFIN) [R^2A 4.4%] – {R^2A(1) 76.5%, R^2A(2) 68.0%} Combined Model: 2 separate regressions Design and Manufacturing – combined thru a common term
  31. 31. Data Mining Example 6. Model Evaluation Model Accurately Reflects Delivery Distribution
  32. 32. Data Mining Example 7. Document / Deploy Design Release Required for On Time Delivery Due Date
  33. 33. Data Mining Example 7. Document / Deploy Update Planning and Automate Tracking Requirements Plan Actual
  34. 34. Data Mining <ul><li>Questions? </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×