Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
332
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
11
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of Dayton MBA 664 13 APR 09
  • 2. Data Mining: Outline
    • Introduction
    • Applications / Issues
    • Products
    • Process
    • Techniques
    • Example
  • 3. Introduction
    • Data Mining Definition
      • Analysis of large amounts of digital data
      • Identify unknown patterns, relationships
      • Draw conclusions AND predict future
    • Data Mining Growth
      • Increase in computer processing speed
      • Decrease in cost of data storage
  • 4. Introduction
    • High Level Process
      • Summarize the Data
      • Generate Predictive Model
      • Verify the Model
    • Analyst Must Understand
      • The business
      • Data and its origins
      • Analysis methods and results
      • Value provided
  • 5. Applications / Issues
    • Applications
      • Telecommunications
        • Cell phone contract turnover
      • Credit Card
        • Fraud identification
      • Finance
        • Corporate performance
      • Retail
        • Targeting products to customers
    • Legal and Ethical Issues
      • Aggregation of data to track individual behavior
  • 6. Data Mining Products
    • Angoss Software ( www.angoss.com )
      • Knowledge Seeker/Studio
      • Strategy Builder
    • Infor Global Solutions ( www.infor.com )
      • Infor CRM Epiphany
    • Portrait Software ( www.portraitsoftware.com )
    • SAS Institute ( www.sas.com )
      • SAS Enterprise Miner
      • SAS Analytics
    • SPSS Inc ( www.spss.com )
      • Clementine
  • 7. Angoss Knowledge Studio
  • 8. SAS Institute
  • 9. SPSS Inc.
  • 10. Data Mining Process
    • No uniformly accepted practice
    • 2002 www. KDnuggets .com survey
      • SPSS CRISP-DM
      • SAS SEMMA
  • 11. Data Mining Process
    • SPSS CRISP-DM
      • CRoss Industry Standard Process for Data Modeling
      • Consortium: Daimler-Chrysler, SPSS, NCR
      • Hierarchical Process – Cyclical and Iterative
  • 12. Data Mining Process
    • CRISP-DM
  • 13. Data Mining Process
    • SAS SEMMA
      • Model development is focus
      • User defines problem, conditions data outside SEMMA
        • S ample – portion data, statistically
        • E xplore – view, plot, subgroup
        • M odify – select, transform, update
        • M odel – fit data, any technique
        • A ssess – evaluate for usefulness
  • 14. Data Mining Process
    • Common Steps in Any DM Process
      • 1. Problem Definition
      • 2. Data Collection
      • 3. Data Review
      • 4. Data Conditioning
      • 5. Model Building
      • 6. Model Evaluation
      • 7. Documentation / Deployment
  • 15. Data Mining Techniques
    • Statistical Methods (Sample Statistics, Linear Regression)
    • Nearest Neighbor Prediction
    • Neural Network
    • Clustering/Segmenting
    • Decision Tree
  • 16. Statistical Methods
    • Sample Statistics
      • Quick look at the data
      • Ex: Minimum, Maximum, Mean, Median, Variance
    • Linear Regression
      • Easy and works with simple problems
      • May need more complex model using different method
  • 17. Example: Linear Regression Customer Income
  • 18. Nearest Neighbor Prediction
    • Easy to understand
    • Used for predicting
    • Works best with few predictor variables
    • Based on the idea that something will behave the same as how others “near” it behave
    • Can also show level of confidence in prediction
  • 19. Example: Nearest Neighbor Distance from Competitor Population of City B A A A A A A A U B B B B A C C C C Product Sales by Population of City and Distance from Competitor A: > 200 units B: 100 – 200 units C: < 100 units
  • 20. Neural Network
    • Contains input, hidden and output layer
    • Used when there are large amounts of predictive variables
    • Model can be used again and again once confirmed successful
    • Can be hard to interpret
    • Extremely time consuming to format the data
  • 21. Example: Neural Network W 1 =.36 W 2 =.64 Population of City Product Sales Prediction Distance from Competitor 0.736
  • 22. Clustering/Segmenting
    • Not used for prediction
    • Forms groups that are very similar or very different
    • Gives an overall view of the data
    • Can also be used to identify potential problems if there is an outlier
  • 23. Example: Clustering/Segmenting < 40 years >= 40 years Red = Female Blue = Male Dimension A
  • 24. Decision Trees
    • Uses categorical variables
    • Determines what variable is causing the greatest “split” between the data
    • Easy to interpret
    • Not much data formatting
    • Can be used for many different situations
  • 25. Example: Decision Trees F M -.63 n = 24 -.29 n = 24 -.29 n = 24 Change from original score .14 n = 115 .58 n = 67 -.46 n = 48 Baseline < 3.75 Baseline >= 3.75 M F .76 n = 51 .47 n = 28 1.11 n = 23 Large body type Small body type
  • 26. Data Mining Example 1. Problem Definition
    • Improve On-Time Delivery of New Products
  • 27. Data Mining Example 2. Collect Data Brainstorm Variation Sources Data Collection Plan
  • 28. Data Mining Example 3. Data Review
    • Data Segments
    TOTAL LEAD TIME by Part Type: p < .05 Level N Mean StDev ----+---------+---------+---------+-- BRACKET 520 x6.76 x3.14 (--*-) DUCT 138 x6.70 x0.40 (----*---) MANIFOLD 44 x9.95 x4.68 (-------*-------) TUBE 47 x3.60 x2.79 (------*-------) ----+---------+---------+---------+-- Pooled StDev = 68.47
  • 29. Data Mining Example 5. Build Model
  • 30. Data Mining Example 5. Build Model SHIP-DUE = 7.97 + 0.269*(MODEL_CR-DUE) + 0.173*(CR-ISS) + 0.704*(MAN_BOMC) + 0.748*(SCH_ST-MAN) + 0.862*(MOS_MOFIN) [R^2A 4.4%] – {R^2A(1) 76.5%, R^2A(2) 68.0%} Combined Model: 2 separate regressions Design and Manufacturing – combined thru a common term
  • 31. Data Mining Example 6. Model Evaluation Model Accurately Reflects Delivery Distribution
  • 32. Data Mining Example 7. Document / Deploy Design Release Required for On Time Delivery Due Date
  • 33. Data Mining Example 7. Document / Deploy Update Planning and Automate Tracking Requirements Plan Actual
  • 34. Data Mining
    • Questions?