Presentation
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Presentation

on

  • 535 views

 

Statistics

Views

Total Views
535
Views on SlideShare
535
Embed Views
0

Actions

Likes
0
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Presentation Presentation Transcript

  • 1. DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of Dayton MBA 664 13 APR 09
  • 2. Data Mining: Outline
    • Introduction
    • Applications / Issues
    • Products
    • Process
    • Techniques
    • Example
  • 3. Introduction
    • Data Mining Definition
      • Analysis of large amounts of digital data
      • Identify unknown patterns, relationships
      • Draw conclusions AND predict future
    • Data Mining Growth
      • Increase in computer processing speed
      • Decrease in cost of data storage
  • 4. Introduction
    • High Level Process
      • Summarize the Data
      • Generate Predictive Model
      • Verify the Model
    • Analyst Must Understand
      • The business
      • Data and its origins
      • Analysis methods and results
      • Value provided
  • 5. Applications / Issues
    • Applications
      • Telecommunications
        • Cell phone contract turnover
      • Credit Card
        • Fraud identification
      • Finance
        • Corporate performance
      • Retail
        • Targeting products to customers
    • Legal and Ethical Issues
      • Aggregation of data to track individual behavior
  • 6. Data Mining Products
    • Angoss Software ( www.angoss.com )
      • Knowledge Seeker/Studio
      • Strategy Builder
    • Infor Global Solutions ( www.infor.com )
      • Infor CRM Epiphany
    • Portrait Software ( www.portraitsoftware.com )
    • SAS Institute ( www.sas.com )
      • SAS Enterprise Miner
      • SAS Analytics
    • SPSS Inc ( www.spss.com )
      • Clementine
  • 7. Angoss Knowledge Studio
  • 8. SAS Institute
  • 9. SPSS Inc.
  • 10. Data Mining Process
    • No uniformly accepted practice
    • 2002 www. KDnuggets .com survey
      • SPSS CRISP-DM
      • SAS SEMMA
  • 11. Data Mining Process
    • SPSS CRISP-DM
      • CRoss Industry Standard Process for Data Modeling
      • Consortium: Daimler-Chrysler, SPSS, NCR
      • Hierarchical Process – Cyclical and Iterative
  • 12. Data Mining Process
    • CRISP-DM
  • 13. Data Mining Process
    • SAS SEMMA
      • Model development is focus
      • User defines problem, conditions data outside SEMMA
        • S ample – portion data, statistically
        • E xplore – view, plot, subgroup
        • M odify – select, transform, update
        • M odel – fit data, any technique
        • A ssess – evaluate for usefulness
  • 14. Data Mining Process
    • Common Steps in Any DM Process
      • 1. Problem Definition
      • 2. Data Collection
      • 3. Data Review
      • 4. Data Conditioning
      • 5. Model Building
      • 6. Model Evaluation
      • 7. Documentation / Deployment
  • 15. Data Mining Techniques
    • Statistical Methods (Sample Statistics, Linear Regression)
    • Nearest Neighbor Prediction
    • Neural Network
    • Clustering/Segmenting
    • Decision Tree
  • 16. Statistical Methods
    • Sample Statistics
      • Quick look at the data
      • Ex: Minimum, Maximum, Mean, Median, Variance
    • Linear Regression
      • Easy and works with simple problems
      • May need more complex model using different method
  • 17. Example: Linear Regression Customer Income
  • 18. Nearest Neighbor Prediction
    • Easy to understand
    • Used for predicting
    • Works best with few predictor variables
    • Based on the idea that something will behave the same as how others “near” it behave
    • Can also show level of confidence in prediction
  • 19. Example: Nearest Neighbor Distance from Competitor Population of City B A A A A A A A U B B B B A C C C C Product Sales by Population of City and Distance from Competitor A: > 200 units B: 100 – 200 units C: < 100 units
  • 20. Neural Network
    • Contains input, hidden and output layer
    • Used when there are large amounts of predictive variables
    • Model can be used again and again once confirmed successful
    • Can be hard to interpret
    • Extremely time consuming to format the data
  • 21. Example: Neural Network W 1 =.36 W 2 =.64 Population of City Product Sales Prediction Distance from Competitor 0.736
  • 22. Clustering/Segmenting
    • Not used for prediction
    • Forms groups that are very similar or very different
    • Gives an overall view of the data
    • Can also be used to identify potential problems if there is an outlier
  • 23. Example: Clustering/Segmenting < 40 years >= 40 years Red = Female Blue = Male Dimension A
  • 24. Decision Trees
    • Uses categorical variables
    • Determines what variable is causing the greatest “split” between the data
    • Easy to interpret
    • Not much data formatting
    • Can be used for many different situations
  • 25. Example: Decision Trees F M -.63 n = 24 -.29 n = 24 -.29 n = 24 Change from original score .14 n = 115 .58 n = 67 -.46 n = 48 Baseline < 3.75 Baseline >= 3.75 M F .76 n = 51 .47 n = 28 1.11 n = 23 Large body type Small body type
  • 26. Data Mining Example 1. Problem Definition
    • Improve On-Time Delivery of New Products
  • 27. Data Mining Example 2. Collect Data Brainstorm Variation Sources Data Collection Plan
  • 28. Data Mining Example 3. Data Review
    • Data Segments
    TOTAL LEAD TIME by Part Type: p < .05 Level N Mean StDev ----+---------+---------+---------+-- BRACKET 520 x6.76 x3.14 (--*-) DUCT 138 x6.70 x0.40 (----*---) MANIFOLD 44 x9.95 x4.68 (-------*-------) TUBE 47 x3.60 x2.79 (------*-------) ----+---------+---------+---------+-- Pooled StDev = 68.47
  • 29. Data Mining Example 5. Build Model
  • 30. Data Mining Example 5. Build Model SHIP-DUE = 7.97 + 0.269*(MODEL_CR-DUE) + 0.173*(CR-ISS) + 0.704*(MAN_BOMC) + 0.748*(SCH_ST-MAN) + 0.862*(MOS_MOFIN) [R^2A 4.4%] – {R^2A(1) 76.5%, R^2A(2) 68.0%} Combined Model: 2 separate regressions Design and Manufacturing – combined thru a common term
  • 31. Data Mining Example 6. Model Evaluation Model Accurately Reflects Delivery Distribution
  • 32. Data Mining Example 7. Document / Deploy Design Release Required for On Time Delivery Due Date
  • 33. Data Mining Example 7. Document / Deploy Update Planning and Automate Tracking Requirements Plan Actual
  • 34. Data Mining
    • Questions?