Your SlideShare is downloading. ×
0
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Presentation

345

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
345
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of Dayton MBA 664 13 APR 09
  2. Data Mining: Outline <ul><li>Introduction </li></ul><ul><li>Applications / Issues </li></ul><ul><li>Products </li></ul><ul><li>Process </li></ul><ul><li>Techniques </li></ul><ul><li>Example </li></ul>
  3. Introduction <ul><li>Data Mining Definition </li></ul><ul><ul><li>Analysis of large amounts of digital data </li></ul></ul><ul><ul><li>Identify unknown patterns, relationships </li></ul></ul><ul><ul><li>Draw conclusions AND predict future </li></ul></ul><ul><li>Data Mining Growth </li></ul><ul><ul><li>Increase in computer processing speed </li></ul></ul><ul><ul><li>Decrease in cost of data storage </li></ul></ul>
  4. Introduction <ul><li>High Level Process </li></ul><ul><ul><li>Summarize the Data </li></ul></ul><ul><ul><li>Generate Predictive Model </li></ul></ul><ul><ul><li>Verify the Model </li></ul></ul><ul><li>Analyst Must Understand </li></ul><ul><ul><li>The business </li></ul></ul><ul><ul><li>Data and its origins </li></ul></ul><ul><ul><li>Analysis methods and results </li></ul></ul><ul><ul><li>Value provided </li></ul></ul>
  5. Applications / Issues <ul><li>Applications </li></ul><ul><ul><li>Telecommunications </li></ul></ul><ul><ul><ul><li>Cell phone contract turnover </li></ul></ul></ul><ul><ul><li>Credit Card </li></ul></ul><ul><ul><ul><li>Fraud identification </li></ul></ul></ul><ul><ul><li>Finance </li></ul></ul><ul><ul><ul><li>Corporate performance </li></ul></ul></ul><ul><ul><li>Retail </li></ul></ul><ul><ul><ul><li>Targeting products to customers </li></ul></ul></ul><ul><li>Legal and Ethical Issues </li></ul><ul><ul><li>Aggregation of data to track individual behavior </li></ul></ul>
  6. Data Mining Products <ul><li>Angoss Software ( www.angoss.com ) </li></ul><ul><ul><li>Knowledge Seeker/Studio </li></ul></ul><ul><ul><li>Strategy Builder </li></ul></ul><ul><li>Infor Global Solutions ( www.infor.com ) </li></ul><ul><ul><li>Infor CRM Epiphany </li></ul></ul><ul><li>Portrait Software ( www.portraitsoftware.com ) </li></ul><ul><li>SAS Institute ( www.sas.com ) </li></ul><ul><ul><li>SAS Enterprise Miner </li></ul></ul><ul><ul><li>SAS Analytics </li></ul></ul><ul><li>SPSS Inc ( www.spss.com ) </li></ul><ul><ul><li>Clementine </li></ul></ul>
  7. Angoss Knowledge Studio
  8. SAS Institute
  9. SPSS Inc.
  10. Data Mining Process <ul><li>No uniformly accepted practice </li></ul><ul><li>2002 www. KDnuggets .com survey </li></ul><ul><ul><li>SPSS CRISP-DM </li></ul></ul><ul><ul><li>SAS SEMMA </li></ul></ul>
  11. Data Mining Process <ul><li>SPSS CRISP-DM </li></ul><ul><ul><li>CRoss Industry Standard Process for Data Modeling </li></ul></ul><ul><ul><li>Consortium: Daimler-Chrysler, SPSS, NCR </li></ul></ul><ul><ul><li>Hierarchical Process – Cyclical and Iterative </li></ul></ul>
  12. Data Mining Process <ul><li>CRISP-DM </li></ul>
  13. Data Mining Process <ul><li>SAS SEMMA </li></ul><ul><ul><li>Model development is focus </li></ul></ul><ul><ul><li>User defines problem, conditions data outside SEMMA </li></ul></ul><ul><ul><ul><li>S ample – portion data, statistically </li></ul></ul></ul><ul><ul><ul><li>E xplore – view, plot, subgroup </li></ul></ul></ul><ul><ul><ul><li>M odify – select, transform, update </li></ul></ul></ul><ul><ul><ul><li>M odel – fit data, any technique </li></ul></ul></ul><ul><ul><ul><li>A ssess – evaluate for usefulness </li></ul></ul></ul>
  14. Data Mining Process <ul><li>Common Steps in Any DM Process </li></ul><ul><ul><li>1. Problem Definition </li></ul></ul><ul><ul><li>2. Data Collection </li></ul></ul><ul><ul><li>3. Data Review </li></ul></ul><ul><ul><li>4. Data Conditioning </li></ul></ul><ul><ul><li>5. Model Building </li></ul></ul><ul><ul><li>6. Model Evaluation </li></ul></ul><ul><ul><li>7. Documentation / Deployment </li></ul></ul>
  15. Data Mining Techniques <ul><li>Statistical Methods (Sample Statistics, Linear Regression) </li></ul><ul><li>Nearest Neighbor Prediction </li></ul><ul><li>Neural Network </li></ul><ul><li>Clustering/Segmenting </li></ul><ul><li>Decision Tree </li></ul>
  16. Statistical Methods <ul><li>Sample Statistics </li></ul><ul><ul><li>Quick look at the data </li></ul></ul><ul><ul><li>Ex: Minimum, Maximum, Mean, Median, Variance </li></ul></ul><ul><li>Linear Regression </li></ul><ul><ul><li>Easy and works with simple problems </li></ul></ul><ul><ul><li>May need more complex model using different method </li></ul></ul>
  17. Example: Linear Regression Customer Income
  18. Nearest Neighbor Prediction <ul><li>Easy to understand </li></ul><ul><li>Used for predicting </li></ul><ul><li>Works best with few predictor variables </li></ul><ul><li>Based on the idea that something will behave the same as how others “near” it behave </li></ul><ul><li>Can also show level of confidence in prediction </li></ul>
  19. Example: Nearest Neighbor Distance from Competitor Population of City B A A A A A A A U B B B B A C C C C Product Sales by Population of City and Distance from Competitor A: > 200 units B: 100 – 200 units C: < 100 units
  20. Neural Network <ul><li>Contains input, hidden and output layer </li></ul><ul><li>Used when there are large amounts of predictive variables </li></ul><ul><li>Model can be used again and again once confirmed successful </li></ul><ul><li>Can be hard to interpret </li></ul><ul><li>Extremely time consuming to format the data </li></ul>
  21. Example: Neural Network W 1 =.36 W 2 =.64 Population of City Product Sales Prediction Distance from Competitor 0.736
  22. Clustering/Segmenting <ul><li>Not used for prediction </li></ul><ul><li>Forms groups that are very similar or very different </li></ul><ul><li>Gives an overall view of the data </li></ul><ul><li>Can also be used to identify potential problems if there is an outlier </li></ul>
  23. Example: Clustering/Segmenting < 40 years >= 40 years Red = Female Blue = Male Dimension A
  24. Decision Trees <ul><li>Uses categorical variables </li></ul><ul><li>Determines what variable is causing the greatest “split” between the data </li></ul><ul><li>Easy to interpret </li></ul><ul><li>Not much data formatting </li></ul><ul><li>Can be used for many different situations </li></ul>
  25. Example: Decision Trees F M -.63 n = 24 -.29 n = 24 -.29 n = 24 Change from original score .14 n = 115 .58 n = 67 -.46 n = 48 Baseline < 3.75 Baseline >= 3.75 M F .76 n = 51 .47 n = 28 1.11 n = 23 Large body type Small body type
  26. Data Mining Example 1. Problem Definition <ul><li>Improve On-Time Delivery of New Products </li></ul>
  27. Data Mining Example 2. Collect Data Brainstorm Variation Sources Data Collection Plan
  28. Data Mining Example 3. Data Review <ul><li>Data Segments </li></ul>TOTAL LEAD TIME by Part Type: p < .05 Level N Mean StDev ----+---------+---------+---------+-- BRACKET 520 x6.76 x3.14 (--*-) DUCT 138 x6.70 x0.40 (----*---) MANIFOLD 44 x9.95 x4.68 (-------*-------) TUBE 47 x3.60 x2.79 (------*-------) ----+---------+---------+---------+-- Pooled StDev = 68.47
  29. Data Mining Example 5. Build Model
  30. Data Mining Example 5. Build Model SHIP-DUE = 7.97 + 0.269*(MODEL_CR-DUE) + 0.173*(CR-ISS) + 0.704*(MAN_BOMC) + 0.748*(SCH_ST-MAN) + 0.862*(MOS_MOFIN) [R^2A 4.4%] – {R^2A(1) 76.5%, R^2A(2) 68.0%} Combined Model: 2 separate regressions Design and Manufacturing – combined thru a common term
  31. Data Mining Example 6. Model Evaluation Model Accurately Reflects Delivery Distribution
  32. Data Mining Example 7. Document / Deploy Design Release Required for On Time Delivery Due Date
  33. Data Mining Example 7. Document / Deploy Update Planning and Automate Tracking Requirements Plan Actual
  34. Data Mining <ul><li>Questions? </li></ul>

×