Presentation Title


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Presentation Title

  1. 1. Adventures in SegmentationUsing Applied Data Mining to add Business Value   <br />Drew Minkin<br />
  2. 2. The Value Add of Data Mining<br />Segmentation 101<br />Segmentation Tools in Analysis Services<br />Methodology for Segmentation Analysis<br />Building Confidence in your Model<br />2<br />Agenda<br />
  3. 3. 3<br />The Value Add of Data Mining<br />
  4. 4. Statistics for the Computer Age<br />Evolution, not revolution with traditional statistics<br />Statistics enriched with brute-force capabilities of modern computing<br />Associated with industrial-sized data sets<br />4<br />Value Add - What is Data Mining?<br />
  5. 5. 5<br />Data Mining<br />OLAP <br />Reports (Ad hoc)<br />Reports (Static)<br />Value Add - Data Mining in the BI Spectrum<br />Business Knowledge<br />SQL-Server 2008<br />Relative Business Value<br />Easy<br />Difficult<br />
  6. 6. VoterVault<br />From Mid-1990s<br />Massive get-out-the-vote drive for those expected to vote Republican<br />Demzilla<br />Names typically have 200 to 400 information items<br />6<br />Value Add – Data Mining and Democracy<br />
  7. 7. “The quiet statisticians have changed our world; not by discovering new facts or technical developments, but by changing the ways that we reason, experiment and form our opinions.” <br /> -- Ian Hacking<br />Value Add – The Promise of Data Mining<br />7<br />
  8. 8. 8<br />Value Add – Spheres of Influence<br />
  9. 9. Value Add – Operational Benefits<br />Improved efficiency<br />Inventory management<br />Risk management<br />
  10. 10. Value Add – Strategic Benefits<br />The Bottom Line<br />Increased agility<br />Brand building<br />Differentiate message<br />“Relationship” building<br />
  11. 11. Value Add – Tactical Benefits<br />Reduction of costs<br />Transactional leakage<br />Outlier analysis<br />
  12. 12. Identify a group of customers who are expected to attrite<br />Conduct marketing campaigns to change the behavior in the desired direction <br />change their behavior, reduce the attrition rate. <br />Value Add - Customer Attrition Analysis<br />
  13. 13. Slow attriters: Customers who slowly pay down their outstanding balance until they become inactive. <br />Fast attriters: Customers who quickly pay down their balance and either lapse it or close it via phone call or write in. <br />Value Add - Target Result<br />
  14. 14. Credit models<br />Retention models<br />Elasticity models<br />Cross-sell models<br />Lifetime Value models<br />Agent/agency monitoring<br />Target marketing<br />Fraud detection<br />Value Add - Sample Applications<br />14<br />
  15. 15. 15<br />Segmentation 101<br />
  16. 16. Unsupervised learning<br />Associations and patterns <br />many entities<br />target information<br />Market basket analysis (“diapers and beer”)<br />Supervised learning<br />Predict the value <br />target variable <br />well-defined predictive variables<br />Credit / non-credit scoring engines <br />16<br />Segmentation – Machine Learning<br />
  17. 17. Segmentation –Sample Data Sources<br />Data Warehouse: Credit Card Data Warehouse containing about 200 product specific fields<br />Third Party Data : A set of account related demographic and credit bureau information<br />Segmentation files :Set of account related segmentation values based on our client's segmentation scheme which combines Risk, Profitability and External potential<br />Payment Database :Database that stores all checks processed. The database can categorize source of checks <br />
  18. 18. 18<br />Methodology for Segmentation Analysis<br />
  19. 19. 19<br />Methodology–Distribution of Effort<br />
  20. 20. 20<br />Methodology – Segmentation Lifecycle<br />
  21. 21. Research/Evaluate possible data sources<br />Availability<br />Hit rate<br />Implementability<br />Cost-effectiveness<br />Extract/purchase data<br />Check data for quality (QA)<br />At this stage, data is still in a “raw” form<br />Often start with voluminous transactional data<br />Much of the data mining process is “messy”<br />Methodology – Acquiring Raw Data<br />21<br />
  22. 22. Reflects data changes over time.<br />Recognizes and removes statistically insignificant fields<br />Defines and introduces the "target" field<br />Allows for second stage preprocessing and statistical analysis.<br />Methodology – Goals of Refinement<br />
  23. 23. Scoring engine<br />Formula that classifies or separates policies (or risks, accounts, agents…) into<br />profitable vs. unprofitable<br />Retaining vs. non-retaining…<br />(Non-)Linear equation f() of several predictive variables<br />Produces continuous range of scores<br />score = f(X1, X2, …, XN)<br />Methodology - Scoring Engines<br />23<br />
  24. 24. Data <br />To Predict<br />Training Data<br />Mining Model<br />Mining Model<br />Mining Model<br />Methodology – Deployed Model<br />DB data<br />Client data<br />Application log<br />“Just one row”<br />New Entry<br />New Txion<br />DM<br />Engine<br />DM<br />Engine<br />Predicted Data<br />
  25. 25. Randomly divide data into 3 pieces<br />Training data<br />Test data <br />Validation data<br />Use Training data to fit models<br />Score the Test data to create a lift curve<br />Perform the train/test steps iteratively until you have a model you’re happy with<br />During this iterative phase, validation data is set aside in a “lock box” <br />Score the Validation data and produce a lift curve<br />Unbiased estimate of future performance<br />Methodology - Testing<br />25<br />
  26. 26. Examine correlations among the variables<br />Weed out redundant, weak, poorly distributed variables<br />Model design<br />Build candidate models<br />Regression/GLM<br />Decision Trees/MARS<br />Neural Networks<br />Select final model<br />26<br />Methodology - Multivariate Analysis<br />
  27. 27. 27<br />Segmentation Tools in Analysis Services<br />
  28. 28. Data Mining - Algorithm Matrix<br />Segmentation<br />Advanced Data <br />Exploration<br />Classification<br />Forecasting<br />Association<br />Text Analysis<br />Estimation<br />Association Rules<br />Clustering<br />Decision Trees<br />Linear Regression<br />Logistic Regression<br />Naïve Bayes<br />Neural Nets<br />Sequence Clustering<br />Time Series<br />
  29. 29. 29<br />Data Mining - SQL-Server Algorithms<br />Decision Trees<br />Time Series<br /> Neural Net<br />Clustering<br /> Sequence Clustering<br />Association<br />Naïve Bayes<br />Linear and Logistic Regression<br />
  30. 30. Offline and online modes<br />Everything you do stays on the server<br />Offline requires server admin privileges to deploy<br /><ul><li>Define Data Sources and Data Source Views
  31. 31. Define Mining Structure and Models
  32. 32. Train (process) the Structures
  33. 33. Verify accuracy
  34. 34. Explore and visualise
  35. 35. Perform predictions
  36. 36. Deploy for other users
  37. 37. Regularly update and re-validate the Model</li></ul>Data Mining - Blueprint for Toolset<br />
  38. 38. Data Mining - Cross-Validation <br />SQL Server 2008 <br />X iterations of retraining and retesting the model<br />Results from each test statistically collated<br />Model deemed accurate (and perhaps reliable) when variance is low and results meet expectations<br />
  39. 39. Data Mining - Microsoft Decision Trees<br />Use for:<br />Classification: churn and risk analysis<br />Regression: predict profit or income <br />Association analysis based on multiple predictable variable<br />Builds one tree for each predictable attribute<br />Fast<br />
  41. 41. Data Mining - Microsoft Naïve Bayes <br />Use for:<br />Classification<br />Association with multiple predictable attributes<br />Assumes all inputs are independent<br />Simple classification technique based on conditional probability<br />
  43. 43. Data Mining - Clustering<br />Applied to <br />Segmentation: Customer grouping, Mailing campaign<br />Also: classification and regression<br />Anomaly detection<br />Discrete and continuous<br />Note:<br />“Predict Only” attributes not used for clustering<br />
  45. 45. Data Mining - Neural Network<br />Applied to<br />Classification<br />Regression<br />Great for finding complicated relationship among attributes<br />Difficult to interpret results<br />Gradient Descent method<br />Output Layer<br />Loyalty<br />Hidden <br />Layers<br />Input Layer<br />Age<br />Education<br />Sex<br />Income<br />
  47. 47. Data Mining - Sequence Clustering<br />Analysis of:<br />Customer behaviour<br />Transaction patterns<br />Click stream<br />Customer segmentation<br />Sequence prediction<br />Mix of clustering and sequence technologies<br />Groups individuals based on their profiles including sequence data<br />
  48. 48. To discover the most likely beginning, paths, and ends of a customer’s journey through our domain consider using:<br />Association Rules<br />Sequence Clustering<br />Data Mining - What is a Sequence?<br />
  49. 49. Data Mining - Sequence Data<br />
  50. 50. Your “if” statement will test the value returned from a prediction – typically, predicted probability or outcome<br />Steps:<br />Build a case (set of attributes) representing the transaction you are processing at the moment<br />E.g. Shopping basket of a customer plus their shipping info<br />Execute a “SELECT ... PREDICTION JOIN” on the pre-loaded mining model<br />Read returned attributes, especially case probability for a some outcome<br />E.g. Probability > 50% that “TransactionOutcome=ShippingDeliveryFailure”<br />Your application has just made an intelligent decision!<br />Remember to refresh and retest the model regularly – daily?<br />Data Mining – Minor Introduction to DMX<br />
  51. 51. CLUSTER_COUNT<br />MAXIMUM_SEQUENCE_STATES<br />MAXIMUM_STATES<br />MINIMUM_SUPPORT<br />Data Mining- Sequence Clustering Parameters<br />
  52. 52. 45<br />Data Mining – Detailed Workflow<br />
  53. 53. 46<br />Data Mining – Detailed Mining Model<br />
  54. 54. 47<br />Data Mining – Detailed Mining Model<br />
  55. 55. 48<br />Data Mining – Detailed Mining Model<br />
  56. 56. 49<br />Building Confidence in your Segmentation<br />
  57. 57. Which target variable to use?<br />Frequency & severity<br />Loss Ratio, other profitability measures<br />Binary targets: defection, cross-sell<br />…etc<br />How to prepare the target variable?<br />Period - 1-year or Multi-year?<br />Losses evaluated @?<br />Cap large losses?<br />Cat losses?<br />How / whether to re-rate, adjust premium?<br />What counts as a “retaining” policy?<br />…etc<br />Building Confidence - Model Design<br />50<br />
  58. 58. Approaches<br />Change the algorithm<br />Change model parameters<br />Change inputs/outputs to avoid bad correlations<br />Clean the data set<br />Perhaps there are no good patterns in data<br />Verify statistics (Data Explorer)<br />Building Confidence - Improving Models<br />
  59. 59. Capping <br />Outliers reduced in influence and to produce better estimates. <br />Binning<br />Small and insignificant levels of character variables are regrouped. <br />Box-Cox Transformations <br />These transformations are commonly included, specially, the square root and logarithm. <br />Johnson Transformations <br /> Performed on numeric variables to make them more ‘normal’. <br />Weight of Evidence <br /> Created for character variables and binned numeric variables. <br />52<br />Building Confidence – Alternate Methods <br />
  60. 60. 53<br />Building Confidence - Confusion Matrix<br />1241 correct predictions (516 + 725) .<br />35 incorrect predictions (25 + 10).<br />The model scored 1276 cases (1241+35).<br />The error rate is 35/1276 = 0.0274.<br />The accuracy rate is 1241/1276 = 0.9725.<br />
  61. 61. “All models are wrong, but some are useful." <br />George Box<br />54<br />Building Confidence – Warning Signs<br />
  62. 62. Extrapolation<br />Applying models from unrelated disciplines<br />Equality<br />The real world contains a surprising amount of uncertainty, fuzziness, and precariousness.<br />Copula<br />Binding probabilities can mask errors<br />Distribution functions<br />Small miscalculations can make coincidences look like certainties<br />Gamma<br />Human behavior difficult to quantify as a linear parameter<br />55<br />Building Confidence –Li’s Revenge<br />
  63. 63. 56<br />Building Confidence - Lift Curves<br />Sort data by score<br />Break the dataset into 10 equal pieces <br />Best “decile”: lowest score  lowest LR<br />Worst “decile”: highest score  highest LR<br />Difference: “Lift”<br />Lift = segmentation power<br />Lift translates into ROI of the modeling project<br />
  64. 64. Building Confidence – Vetted Results<br />~Top 5% of 750000 <br />2 groups with 10000 customers from random sampling <br />37500 top customers from the prediction list sorted by the score<br />Group 1<br />Engaged or offered incentives by marketing department<br />Group 2<br />No action <br />Results<br />Group 1 has a attrition rate 0.8%, <br />Group 2 has 10.6% <br />Average attrition rate is 2.2% Lift is 4.8 (10.6% /2.2%)<br />
  65. 65. 58<br />Discussion<br />
  66. 66. Xiaohua Hu, Drexel University<br />Jerome Friedman, Trevor Hastie, Robert Tibshirani ,The Elements of Statistical Learning<br />James Guszcza,Deloitte<br />Jeff Kaplan, Apollo Data Technologies<br />Rafal Lukawiecki, Project Botticelli Ltd<br />David L. Olson, University of Nebraska Lincoln<br />Donald Farmer, ZhaoHui Tang and Jamie MacLennan, Microsoft<br />ASA Corporation<br />Richard Boire, Boire Filler Group,<br />John Spooner, SAS Corporation<br />Shin-Yuan Hung , Hsiu-Yu Wang , National Chung-Cheng University <br />Felix Salmon and Chris Anderson, Wired Magazine<br />59<br />Credits<br />