Data Mining with Clementine

3,775 views
3,615 views

Published on

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,775
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
730
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • SPSS Inc. Copyright 2003-4, SPSS Inc.
  • Adding a node in Clementine is relatively simple: just select the node you want from the palette menus and drag and drop it on to the canvas.
  • SPSS Inc. Copyright 2003-4, SPSS Inc.
  • Data Mining with Clementine

    1. 1. <ul><li>Data Mining with Clementine </li></ul>Girish Punj Professor of Marketing School of Business University of Connecticut
    2. 2. <ul><li>How to introduce data mining to students </li></ul><ul><li>Why Clementine? </li></ul><ul><li>Clementine features and capabilities </li></ul><ul><li>A typical data mining class </li></ul><ul><li>Useful teaching resources </li></ul><ul><li>Questions? </li></ul><ul><li>Agenda </li></ul>
    3. 3. <ul><li>Data mining chosen as one of top 10 emerging technologies..” (MIT Technology Review) </li></ul><ul><li>Data mining expertise is most sought after...” (Information Week Survey) </li></ul><ul><li>Data mining skills are an important part of the “toolkit” needed by managers in a complex business world </li></ul><ul><li>Data Mining for job advancement and as career insurance during good and bad economic times </li></ul><ul><li>Introduce Data Mining to Students </li></ul>
    4. 4. <ul><li>“ When I looked at what companies were doing with </li></ul><ul><li>analytics I found it had moved from the back room to </li></ul><ul><li>the board room…a number of companies weren’t just </li></ul><ul><li>using analytics, they were now competing on </li></ul><ul><li>analytics -- they had made analytics the central strategy </li></ul><ul><li>of their business.” </li></ul><ul><ul><li>(Tom Davenport, author of ‘ Competing on Analytics’) </li></ul></ul><ul><li>“ We are drowning in information but starved for </li></ul><ul><li>knowledge.” </li></ul><ul><li> (John Naisbitt author of ‘ Megatrends’ ) </li></ul><ul><li>Introduce Data Mining to Students </li></ul>
    5. 5. <ul><li>Applications: Retail </li></ul><ul><li>Use data mining to understand customers’ wants, needs, and preferences </li></ul><ul><li>Based on this information, deliver timely, personalized promotional offers </li></ul>
    6. 6. <ul><li>Applications: Insurance </li></ul><ul><li>Leverage data and text mining to speed claims processing and help reduce fraud </li></ul>
    7. 7. <ul><li>Applications: Manufacturing </li></ul><ul><li>Model historical production and quality data to reduce development time and improve quality of production processes </li></ul>
    8. 8. <ul><li>Applications: Telecom </li></ul><ul><li>Use data mining to identify appropriate customer segments for new marketing initiatives </li></ul><ul><li>Predict likelihood of customer churn and target those likely to leave with retention campaigns </li></ul>
    9. 9. <ul><li>Metaphor: Data Mining and Gold Mining </li></ul>
    10. 10. <ul><li>Data Mining and Knowledge Discovery </li></ul><ul><li>Data mining is the process of discovery of interesting , meaningful and actionable patterns hidden in large amounts of data (Han and Kamber 2006) </li></ul><ul><li>Knowledge Discovery (KD) as a more inclusive term </li></ul><ul><li>Knowledge Discovery using a combination of artificial and human intelligence </li></ul><ul><li>Data -> Information -> Knowledge </li></ul>
    11. 11. <ul><li>Data Mining and Statistics </li></ul><ul><li>Data Mining </li></ul><ul><ul><li>No hypotheses are needed </li></ul></ul><ul><ul><li>Can find patterns in very large amounts of data </li></ul></ul><ul><ul><li>Uses all the data available </li></ul></ul><ul><ul><li>Terminology used: field, record, supervised learning, unsupervised learning </li></ul></ul><ul><li>Statistics </li></ul><ul><ul><li>Uses Hypothesis testing </li></ul></ul><ul><ul><li>Techniques are not suitable for large datasets </li></ul></ul><ul><ul><li>Relies on sampling </li></ul></ul><ul><ul><li>Terminology used: variable, observation, analysis of dependence, analysis of interdependence </li></ul></ul>
    12. 12. <ul><li>Deal with Numerophobia </li></ul><ul><li>Emphasize Differences between Statistics and Data Mining to advantage (no probability distributions) </li></ul><ul><li>Use a math primer for numerically challenged students </li></ul>http:// www.youtube.com/watch?v =nRKzseCLja8
    13. 13. <ul><li>Introduce Software to Students </li></ul><ul><li>Clementine 12.0: </li></ul><ul><ul><li>Student Version (Clementine GradPack) is of enterprise strength </li></ul></ul><ul><ul><li>Student License extends for about eight months beyond course completion date </li></ul></ul><ul><ul><li>Directly address cost concerns by discussing value of “investment” </li></ul></ul>
    14. 14. <ul><li>Who was Clementine? </li></ul><ul><li>Daughter of a miner during the 1849 California Gold Rush who develo ped a reputation… </li></ul><ul><li>“ In a cavern, in a canyon, Excavating for a mine Dwelt a miner, forty niner, And his daughter Clementine…” </li></ul>http://www.empire.k12.ca.us/capistrano/mike/capmusic/the_wild_west/gold_rush/clemtine.mid
    15. 15. <ul><li>Visual approach makes model building an art form </li></ul><ul><li>Concept of “data flow” enables building of multiple models </li></ul><ul><li>Point-and-click model building (no manual coding) </li></ul><ul><li>Comprehensive portfolio of models for the Business Analyst as well as the Technical Expert </li></ul><ul><li>Introduce Software to Students </li></ul>
    16. 16. <ul><li>Clementine Basics: Building a Model </li></ul>
    17. 17. <ul><li>Clementine Basics: Select a Data Source </li></ul>
    18. 18. <ul><li>Clementine Basics: Select a Data File </li></ul>
    19. 19. <ul><li>Clementine Basics: Select a Data File </li></ul>
    20. 20. <ul><li>Clementine Basics: Read a Data File </li></ul>
    21. 21. <ul><li>Clementine Basics: Select Fields </li></ul>
    22. 22. <ul><li>Clementine Basics: Define Field Types </li></ul>
    23. 23. <ul><li>Clementine Basics: Visualize Data </li></ul><ul><ul><ul><li>Create tables and charts for means, ranges, and correlations of all variables </li></ul></ul></ul>
    24. 24. <ul><li>Clementine Basics: Visualize Data </li></ul><ul><li>Examine associations among variables using visual displays </li></ul>
    25. 25. <ul><li>Clementine Basics: Select Target and Predictors </li></ul>
    26. 26. <ul><li>Clementine Basics: Execute Model </li></ul>
    27. 27. <ul><li>Clementine Basics: Review Model Results </li></ul>
    28. 28. <ul><li>Building Models in Clementine </li></ul>Models Up sell/ Cross sell Customer Churn Propensity to respond/purchase Creating business rules for Up sell & Cross Sell Identify and target likely churn candidates, and create retention offerings to decrease their likelihood to churn Develop models on desired purchase behavior, and target candidates that are most likely to respond
    29. 29. <ul><li>A Typical Clementine Model </li></ul>
    30. 30. <ul><li>Modeling Approaches </li></ul><ul><li>Can use auto “c.h.d” settings (beginning user) </li></ul><ul><li>But can also use expert capabilities (advanced user) </li></ul>
    31. 31. <ul><li>Data Mining Procedures </li></ul><ul><li>Estimation </li></ul><ul><li>Prediction </li></ul><ul><li>Classification </li></ul><ul><li>Clustering </li></ul><ul><li>Affinity/Association </li></ul>
    32. 32. <ul><li>Specific Methodologies Available </li></ul><ul><li>Estimation & Prediction : </li></ul><ul><li> - Neural networks </li></ul><ul><li> </li></ul><ul><li>Classification : </li></ul><ul><li>- Decision trees (2 types) </li></ul>
    33. 33. <ul><li>Specific Methodologies Available </li></ul><ul><li>Clustering : </li></ul><ul><li>- K-means </li></ul><ul><li>- Kohonen networks </li></ul><ul><li>Affinity/Association : </li></ul><ul><li>- Association rules (2 types) </li></ul>
    34. 34. <ul><li>Positioning the Course </li></ul>Theory and Concepts Business Applications Clementine Models Focus of the Course
    35. 35. <ul><li>A Typical Class </li></ul><ul><ul><li>Discuss business applications of methodology based on brief articles from the business press (30 minutes) </li></ul></ul><ul><li>Present theory and concepts (30 minutes) </li></ul><ul><li>Build a Clementine model for students (30 minutes) </li></ul><ul><li>Ask students build a Clementine model (30 minutes) </li></ul><ul><li>Discuss homework assignment (15 minutes) </li></ul><ul><li>Students complete a homework assignment after class (requires three hours) </li></ul>
    36. 36. <ul><li>Discuss Business Applications </li></ul><ul><li>“ Wal-Mart's next competitive weapon is advanced data mining, which it will use to forecast, replenish and merchandise on a micro scale </li></ul><ul><li>By analyzing years' worth of sales data--and then cranking in variables such as the weather and school schedules--the system could predict the optimal number of cases of Gatorade, in what flavors and sizes, a store in Laredo, Texas, should have on hand the Friday before Labor Day </li></ul><ul><li>Then, if the weather forecast suddenly called for temperatures 5 hotter than last year, the delivery truck would automatically show up with more” </li></ul><ul><li>From: “Can Wal-Mart Get Any Bigger,” Time, 13 January, 2003 </li></ul>
    37. 37. <ul><li>Present Theory and Concepts </li></ul>Where should detergents be placed in the Store to maximize their sales? ? Are window cleaning products also purchased when detergents and orange juice are bought together? ? Is soda typically purchased with bananas? Does the brand of soda make a difference? ? How are the demographics of the neighborhood affecting what Customers are buying? ? From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff
    38. 38. <ul><li>Present Theory and Concepts </li></ul><ul><li>Start with a record of past purchase transactions that link items purchased together </li></ul>From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff
    39. 39. <ul><li>Create a co-occurrence matrix that pairs items purchased together in the form of a table </li></ul>The co-occurrence matrix shows the number of times the “row” item was purchased with the “column” item (note that the matrix is symmetrical) From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff <ul><li>Present Theory and Concepts </li></ul>
    40. 40. <ul><li>Rule Support = Percentage of transactions with both the items of interest </li></ul><ul><li>What is the Support for the rule “If Soda, then OJ” ? </li></ul><ul><ul><li>OJ and Soda are purchased together in 2 out of 5 transactions </li></ul></ul><ul><ul><li>Hence Support is 40% </li></ul></ul><ul><li>What is the support for the rule “If OJ, then Soda” ? </li></ul><ul><ul><li>Still 40% </li></ul></ul>From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff <ul><li>Present Theory and Concepts </li></ul>Customer Items Purchased 1 OJ, soda 2 Milk, OJ, window cleaner 3 OJ, detergent 4 OJ, detergent, soda 5 Window cleaner, soda
    41. 41. <ul><li>Confidence = Ratio of the number of transactions with both the items of interest to the number of transactions with the “If” items </li></ul><ul><li>What is the Confidence for “If Soda, then OJ” ? </li></ul><ul><ul><li>2 out of 3 soda purchase transactions also include OJ </li></ul></ul><ul><ul><li>Hence Confidence is 66.66% </li></ul></ul><ul><li>What is the Confidence for “If OJ, then Soda” ? </li></ul><ul><ul><li>2 out of 4 OJ purchase transactions also include soda </li></ul></ul><ul><ul><li>Hence Confidence is 50% </li></ul></ul>From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff <ul><li>Present Theory and Concepts </li></ul>Customer Items Purchased 1 OJ, soda 2 Milk, OJ, window cleaner 3 OJ, detergent 4 OJ, detergent, soda 5 Window cleaner, soda
    42. 42. <ul><li>Support (Prevalence): Percentage of records in the dataset that match the antecedent Support = p (antecedent) </li></ul>From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff <ul><li>Present Theory and Concepts </li></ul>
    43. 43. <ul><li>Confidence (Predictability): Percentage of records in the dataset that match the antecedent and also match the consequent </li></ul><ul><li>Confidence = </li></ul>p (antecedent and consequent) p (antecedent) From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff <ul><li>Present Theory and Concepts </li></ul>
    44. 44. <ul><li>Lift (Improvement): How much better a rule is at predicting the consequent than chance alone? </li></ul><ul><ul><li> Lift = </li></ul></ul><ul><ul><li>A rule is only useful if Lift is > 1 </li></ul></ul><ul><ul><li>confidence </li></ul></ul><ul><ul><li>p (consequent) </li></ul></ul>From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff <ul><li>Present Theory and Concepts </li></ul>
    45. 45. <ul><li>Build a Clementine Model </li></ul>
    46. 46. <ul><li>Homework Assignment </li></ul><ul><li>Conduct a Market Basket Analysis on the dataset using both the Apriori and GRI modeling nodes in Clementine. </li></ul><ul><li>Reconcile the association rules discovered as a result of the Apriori and GRI modeling nodes. </li></ul><ul><li>Provide a narrative description that attempts to explain the convergence (or lack thereof) between the results obtained from the two modeling nodes.  </li></ul><ul><li>Select those association rules discovered during your Market Basket Analysis that would make the most intuitive sense to the category managers involved and create demographic profiles of shoppers who appear to fit those rules. </li></ul>
    47. 47. <ul><li>Instructor’s Laptop Screen </li></ul>
    48. 48. <ul><li>Student’s Laptop Screen </li></ul>
    49. 49. <ul><li>Resources </li></ul><ul><li>“ Data Mining Techniques” by Michael J. A. Berry and Gordon S. Linoff (second edition), Wiley, 2004 </li></ul><ul><li>“ Discovering Knowledge in Data” by Daniel T. Larose, Wiley, 2005 </li></ul><ul><li>“ Making Sense of Statistics” by Fred Pyrczak (fourth edition), Pyrczak Publishing, 2006 </li></ul><ul><li>Recent articles from the business press identified using the “Factiva” database and “data mining” “predictive analytics” as search keywords </li></ul><ul><li>www.kdnuggets.com </li></ul>
    50. 50. <ul><li>Thank you for your time and participation </li></ul><ul><li>Questions? </li></ul><ul><li>Additional Information: Please see my syllabus at http:// www.spss.com/academic/educator/curriculum/index.htm?tab =1 </li></ul><ul><li>Comments and suggestions are welcome. Please send them to: [email_address] </li></ul>

    ×