Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Customer Relationship Management A Databased Approach V. Kumar Werner J. Reinartz Instructor’s Presentation Slides
  2. 2. Chapter Ten Data Mining
  3. 3. Topics Discussed <ul><li>Applications of Data Mining </li></ul><ul><li>Involvement of the three main groups participating in a data-mining project </li></ul><ul><li>Overview of the Data Mining Process </li></ul><ul><li>CRM at Work: Credite Est and Yapi Kredi </li></ul>
  4. 4. Applications of Data Mining <ul><li>Reducing churn with the help of predictive models, which enable early identification of those customers likely to stop doing business with the company </li></ul><ul><li>Increasing customer profitability by identifying customers with a high growth potential </li></ul><ul><li>Reducing marketing costs by more selective targeting </li></ul>
  5. 5. Overview of the Data Mining Process Get Raw Data Identify Relevant Variables Gain Customer Insight Act <ul><li>Extract </li></ul><ul><li>descriptive and </li></ul><ul><li>transactional data </li></ul><ul><li>Check quality </li></ul><ul><li>Rollup data </li></ul><ul><li>Create analytical </li></ul><ul><li>variables </li></ul><ul><li>Enhance </li></ul><ul><li>analytical data </li></ul><ul><li>Select relevant </li></ul><ul><li>variables </li></ul><ul><li>Train predictive </li></ul><ul><li>models </li></ul><ul><li>Compare </li></ul><ul><li>models </li></ul><ul><li>Select </li></ul><ul><li>models </li></ul><ul><li>Deploy </li></ul><ul><li>models </li></ul><ul><li>Monitor </li></ul><ul><li>performance </li></ul><ul><li>Enhance models </li></ul>- (Re)defineBusiness objectives <ul><li>Define </li></ul><ul><li>objectives </li></ul><ul><li>and expectations </li></ul><ul><li>Define </li></ul><ul><li>measurement </li></ul><ul><li>of success </li></ul>Learn
  6. 6. Timeframe of Data Mining Methodology Raw Relevant Customer (Re - ) Define insight 60 - Get Data Identify Variables Gain Insight Act Business Objectives Today: Most time is spent on data extraction, transformation, data quality Tomorrow: Most time spent on business objectives and customer 70% of process time < 30% of process time
  7. 7. Extent of Involvement of The Three Main Groups Participating in a Data-Mining Project Get Raw Data Identify Relevant Variables Gain Customer Insight Act Objectives Get Raw Data Identify Relevant Variables Gain Customer Insight Act (Re)Define Business 1. Business 2. Data Mining 3. IT Groups
  8. 8. Involvement of Business, Data Mining and IT Resources in a Typical Data Mining Project <ul><li>Data mining group: </li></ul><ul><ul><li>Understand the business objectives and support the business group to refine and sometimes correct the scope, and expectations </li></ul></ul><ul><ul><li>Most active during the variable selection and modeling phase </li></ul></ul><ul><ul><li>Share the obtained customer insights with the business group </li></ul></ul><ul><li>IT resources: </li></ul><ul><ul><li>Required for the sourcing and extraction of the required data used for modeling </li></ul></ul><ul><li>Business group: </li></ul><ul><ul><li>Involved in checking the plausibility and soundness of the solution in business terms </li></ul></ul><ul><ul><li>Takes the lead in deploying the new insights into corporate action such as a call center or direct mail campaign </li></ul></ul>
  9. 9. Manipulations to Data Set <ul><li>Column manipulations: </li></ul><ul><ul><li>Transformation </li></ul></ul><ul><ul><li>Derivation </li></ul></ul><ul><ul><li>Elimination </li></ul></ul><ul><li>Row manipulations </li></ul><ul><ul><li>Aggregation </li></ul></ul><ul><ul><li>Change detection </li></ul></ul><ul><ul><li>Missing value detection </li></ul></ul><ul><ul><li>Outlier detection </li></ul></ul>
  10. 10. Data Preparation <ul><li>For modeling, incoming data is sampled and split into various streams as: </li></ul><ul><li>Train set: Used to build the models </li></ul><ul><li>Test set: Used for out-of-sample tests of the model quality and to select the final model candidate </li></ul><ul><li>Scoring data: Used for model-based prediction , ‘large’ as compared to other data sets </li></ul>
  11. 11. Define Business Objectives <ul><li>Modeling of expected customer potential, in order to target acquisition of </li></ul><ul><li>customers who will be profitable over the whole lifetime of the business </li></ul><ul><li>relationship </li></ul><ul><li>Distinguish between customers with a target variable equal to zero and </li></ul><ul><li>customers with a target variable equal to one </li></ul><ul><li>Establish likelihood threshold levels above which business group think a </li></ul><ul><li>prospect should be included in the marketing campaign </li></ul>Get Raw Data Identify Relevant Variables Gain Customer Insight Act (Re - ) Define Business Objectives Learn
  12. 12. Define Business Objectives (contd.) <ul><li>Define the set of business or selection rules for the campaign (e.g.: , the customers that should be excluded from or included in the target groups) </li></ul><ul><li>Define the details of project execution specifying the start and delivery dates </li></ul><ul><li>of the data mining process, and the responsible resources for each task </li></ul><ul><li>Define the chosen experimental setup for the campaign </li></ul><ul><li>Define a cost/revenue matrix describing how the business mechanics will work in the supported campaign and how it will impact the data mining process </li></ul><ul><li>Establish the criteria for evaluating the success of the campaign </li></ul><ul><li>Find a benchmark to compare against results obtained in the past for the </li></ul><ul><li>same or similar campaign setups using traditional targeting methods, and not predictive models </li></ul>
  13. 13. Cost/Revenue Matrix <ul><li>Will have an impact on the choice of model </li></ul><ul><li>parameters such as the cut-off point for the selected model scores </li></ul><ul><li>It will also give business users an immediately interpretable table </li></ul>
  14. 14. Cost/Revenue Matrix <ul><li>Assuming average cost per call is $5, each positive responder (purchaser) will generate additional cost due to </li></ul><ul><li>administration work required to register him as a new customer </li></ul><ul><li>the cost of the delivered phone handset (say, $100) </li></ul><ul><li>Customers, who respond positively will, generate average revenue of $1000 per year </li></ul>Cost: -$5-$100 1 st year revenue: +$1000 Total: +$895 Cost: -$5 1 st year revenue: $0 Total: -$5 Model predicts prospect will purchase (contacted) lost business opportunity of +$895 Cost: $0 1 st year revenue: $0 Total: $0 Model predicts prospect will not purchase (not contacted) In reality prospect did purchase In reality prospect did not purchase Cost/Revenue matrix
  15. 15. Get Raw Data <ul><li>Identify, extract and consolidate raw data in a database </li></ul><ul><li>(often called “Analytical Data Mart”) </li></ul><ul><li>Check the quality of the analytical raw data - technical checks as well </li></ul><ul><li>as ensuring that the data makes sense in the given business context </li></ul>
  16. 16. Get Raw Data (contd.) <ul><li>Step 1: Looking for Data Sources </li></ul><ul><ul><li>Mixed top-down and bottom-up process, driven by business requirements (top) and technical restrictions (bottom) </li></ul></ul><ul><li>Step 2 : Loading the Data </li></ul><ul><ul><li>Define how the data will be imported into the data mining environment </li></ul></ul><ul><li>Checking Data Quality </li></ul><ul><ul><li>Technical aspects of the data: primary keys, duplicate records, missing values </li></ul></ul><ul><ul><li>Business context: realistic data </li></ul></ul>
  17. 17. Step 1: Looking for Data Sources <ul><li>Data warehouse infrastructures with advanced data cleansing processes can help ensure that you are working with high-quality data </li></ul><ul><li>Build a (simple) relational data model onto which the source data will be mapped </li></ul>
  18. 18. Step 2: Loading the Data <ul><li>Define further query restrictions , prepared by IT teams , for execution at pre-defined time windows in batch mode </li></ul><ul><li>Deliver extracted data to the data mining environment in a pre-defined format </li></ul><ul><li>Further processing and using data to fill previously defined data model in the data mining environment as part of the ETL process (Extract-Transform-Load) </li></ul>
  19. 19. Step 3: Checking Data Quality <ul><li>Assess and understand limitations of data resulting from its inherent quality (good or bad) aspects </li></ul><ul><li>Create an analytical database as the basis for subsequent analyses </li></ul><ul><li>Carry out preliminary data quality assessment </li></ul><ul><ul><li>To assure an acceptable level of quality of the delivered data </li></ul></ul><ul><ul><li>To ensure that the data mining team has a clear understanding of how to interpret the data in business terms </li></ul></ul><ul><li>Data miners have to carry out some basic data interpretation and aggregation exercises </li></ul>
  20. 20. Identify Relevant Predictive Variables Step 1: Create Analytical Customer View – “Flattening” the Data Step 2: Create Analytical Variables Step 3: Select Predictive Variables Get Raw Data Identify Relevant Variables Gain Customer Insight Act (Re - ) Define Business Objectives Learn
  21. 21. Step 1: Create Analytical Customer View – “Flattening” the Data <ul><li>Individual customer constitutes an observational unit for data analysis and predictive modeling </li></ul><ul><li>All data pertaining to an individual customer is contained in one observation (row, record) </li></ul><ul><li>Individual columns (variables, fields) represent the conditions at specific points in time or a summary over a whole period </li></ul><ul><li>Definition of the target or dependent variable- values should be generated for all customers and added to the existing data tables </li></ul>
  22. 22. Step 2: Create Analytical Variables <ul><li>Introduce additional variables derived from the original ones </li></ul><ul><li>When needed, transform variables to get new and more predictive variables </li></ul><ul><li>Increase normality of variable distributions to help the predictive model training process </li></ul><ul><li>Missing value management is key for enhancing the quality of the analytical data set </li></ul>
  23. 23. Step 3: Select Predictive Variables <ul><li>Inspect the descriptive statistics of all univariate distributions associated to all available variables </li></ul><ul><li>Exclude those variables : </li></ul><ul><ul><ul><li>which take on only one value (i.e. the variable is a constant) </li></ul></ul></ul><ul><ul><ul><li>with mostly missing values </li></ul></ul></ul><ul><ul><ul><li>directly or indirectly identifying an individual customer </li></ul></ul></ul><ul><ul><ul><li>showing collinearities </li></ul></ul></ul><ul><ul><ul><li>showing very little correlation with the target variable </li></ul></ul></ul><ul><ul><ul><li>Containing personal identifiers </li></ul></ul></ul><ul><li>Define a threshold missing value count level above which the field would be excluded from further analysis (e.g. more than 95% missing values) </li></ul><ul><li>Check if all variables have been mapped to the appropriate data types </li></ul>
  24. 24. Gain Customer Insight . Step 1: Preparing data samples Step 2: Predictive Modeling Step 3: Select Model Get Raw Data Identify Relevant Variables Gain Customer Insight Act (Re - ) Define Business Objectives Learn
  25. 25. Step 1: Preparing Data Samples <ul><li>Analyze if sufficient data is available to obtain statistically significant results </li></ul><ul><li>If enough data is available, split the data into two samples: </li></ul><ul><ul><li>the train set to fit the models </li></ul></ul><ul><ul><li>the test set to check the model’s performance on observations that have not been used to build it </li></ul></ul>
  26. 26. <ul><li>Two steps: </li></ul><ul><li>The rules (or linear/non-linear analytical models) are built based on a training set </li></ul><ul><li>These rules are then applied to a new dataset for generating the answers needed for the campaign </li></ul><ul><li>Guidelines: </li></ul><ul><li>Distinguish between different types of predictive models obtained through different modeling paradigms: supervised and un-supervised modeling </li></ul><ul><li>Find the right relationships between variables describing the customers to predict their respective group membership likelihood: purchaser or non-purchaser, referred to as scoring (e.g. between 0 and 1) </li></ul><ul><li>Apply unsupervised modeling where group membership is not known beforehand </li></ul>Step 2: Predictive Modeling
  27. 27. Step 3: Select Model <ul><ul><li>Compare relative quality of prediction by comparing respective misclassification rates obtained on the test set </li></ul></ul><ul><ul><li>Example of misclassification error rate or confusion matrix: </li></ul></ul>Input Node - Classification Neural Network (10) 1459 560 899 Totals 677 504 173 0 782 56 726 1 Observed 0 1 Totals Predicted
  28. 28. Act Step 1: Deliver Results to Operational Systems Step 2: Archive Results Step 3 Learn Get Raw Data Identify Relevant Variables Gain Customer Insight Act (Re - ) Define Business Objectives Learn
  29. 29. Step 1: Deliver Results to Operational Systems <ul><li>Apply the selected model to the entire customer base </li></ul><ul><li>Prepare score data set containing the most recent information for each customer with the variables required by the model </li></ul><ul><li>The obtained score value for each customer and the defined threshold value will determine whether the corresponding customer qualifies to participate in the campaign </li></ul><ul><li>When delivering results to the operational systems, provide necessary customer identifiers to unambiguously link the model’s score information to the correct customer </li></ul>
  30. 30. Step 2: Archive Results <ul><li>Each data mining project will produce a huge amount of information including: </li></ul><ul><ul><li>raw data used </li></ul></ul><ul><ul><li>transformations for each variable </li></ul></ul><ul><ul><li>formulas for creating derived variables </li></ul></ul><ul><ul><li>train, test and score data sets </li></ul></ul><ul><ul><li>target variable calculation </li></ul></ul><ul><ul><li>models and their parameterizations </li></ul></ul><ul><ul><li>score threshold levels </li></ul></ul><ul><ul><li>final customer target selections </li></ul></ul><ul><li>Useful to preserve especially if the same model is used to score different data sets obtained at different times </li></ul>
  31. 31. Step 3: Learn <ul><li>Referred to as “closing the loop” </li></ul><ul><li>Obtain the facts describing performance of data mining project and business impact </li></ul><ul><li>Obtained by monitoring campaign performance while it is running and from final campaign performance analysis after the campaign has ended </li></ul><ul><li>Detect when a model has to be re-trained </li></ul>
  32. 32. CRM at Work: Credite Est <ul><li>Regional mid-tier bank in France: use of data mining in marketing </li></ul><ul><li>Uses segmentation scheme based on behavioral characteristics </li></ul><ul><li>(e.g. product ownership), and an activity-based-costing system to identify individual customer level contribution margin </li></ul><ul><li>Project </li></ul><ul><ul><li>Business goal: to acquire new prospects </li></ul></ul><ul><ul><li>Objective: to identify the characteristics of profitable customers in Credite Est’s mass-market segment to efficiently target similar profiles in the prospect pool </li></ul></ul>
  33. 33. Credite Est (contd.) <ul><li>Get Raw Data </li></ul><ul><ul><li>Response variable for current customers is customer contribution margin </li></ul></ul><ul><ul><li>Customers sorted by operating contribution and profile of the top 20% of customers noted </li></ul></ul><ul><ul><li>Transaction information on prospects purchased and then appended to individual records of existing customers </li></ul></ul><ul><li>Identify relevant variables </li></ul><ul><ul><li>To find the profile that best characterizes high value clients which is subsequently applied to prospects’ information </li></ul></ul><ul><ul><li>Model attempts to predict customer operating margin as dependent variable with geodemographic information as independent variables </li></ul></ul><ul><ul><li>Credite Est appended a total of 65 variables to existing customer records </li></ul></ul>
  34. 34. Credite Est (contd.) <ul><li>Select Predictive Variables </li></ul><ul><ul><li>All variables that were appended had almost 50% missing data </li></ul></ul><ul><ul><li>Assessing whether any of the missing data could be meaningfully replaced improved the overall rate of missing values from 42% to 21% </li></ul></ul><ul><ul><li>Investigation of univariate statistics (means, standard deviations, frequencies, outliers) for all variables brought reduction in variables from 65 to 54 </li></ul></ul><ul><ul><li>Calculation of all bi-variate correlations (or mean analyses in case of categorical variables) of existing independent variables with the dependent variable – customer value </li></ul></ul><ul><ul><li>Data evaluation process resulted in a total of 17 variables that had a reasonable correlation with the dependent variable. These were retained for the next step, the response model </li></ul></ul>
  35. 35. Credite Est (contd.) <ul><li>Gain Customer Insight </li></ul><ul><ul><li>Use logistic regression to classify the dependent variable as 0/1; the goal being to either target or not target a certain individual in the prospect pool </li></ul></ul><ul><ul><li>Theory-based elimination variables that are highly collinear </li></ul></ul><ul><ul><li>The ability of the model to correctly classify in a holdout sample was 75.5% in the estimation sample and 69.8% in the holdout sample, roughly 20% higher than based on chance alone </li></ul></ul><ul><ul><li>Result was deemed successful and it was decided to utilize this model for a prospecting campaign </li></ul></ul>
  36. 36. Credite Est (contd.) <ul><li>Act </li></ul><ul><ul><li>Final model was rolled out in sequential fashion to target prospect audience </li></ul></ul><ul><ul><li>Credite Est purchased addresses from list brokers that had at least non-missing vales for 3 out of the 5 variables in the final model </li></ul></ul><ul><ul><li>The prospects were scored with the model and then ranked by likelihood of being a high value customer </li></ul></ul><ul><ul><li>Objective was to assess the receptivity of the two samples of customers for respective products </li></ul></ul><ul><ul><li>Result: Both target mailings were significantly more successful than the base line scenario </li></ul></ul>
  37. 37. CRM at Work :Yapi Kredi – Predictive Model Based Cross-Sell Campaign <ul><li>Challenge: To continue YAPI KREDI’s development as the fastest growing retail bank in Turkey </li></ul><ul><li>Capabilities required : </li></ul><ul><ul><li>Advanced analytical customer segmentation </li></ul></ul><ul><ul><li>Segment specific offering of product bundles </li></ul></ul><ul><ul><li>Conversion of customers to more profitable segments via targeted campaigns using advanced CRM tools such as predictive modeling </li></ul></ul><ul><li>Project plan: </li></ul><ul><ul><li>To carry out a set of pilot projects for cross-selling of consumer banking products </li></ul></ul><ul><ul><li>A reduced selection of target customers with a high propensity to positively respond would be included in a multi-channel, two-step campaign </li></ul></ul>
  38. 38. Yapi Kredi - Define Business Objectives <ul><li>YAPI KREDI’s B-type mutual funds, characterized by </li></ul><ul><ul><li>Being low risk investment instruments based on fixed income securities </li></ul></ul><ul><ul><li>Easily purchased via the ATM, Web, and Telephone channels </li></ul></ul><ul><li>Offer to two customer groups: </li></ul><ul><ul><li>Customers already having invested into B-type mutual funds to stimulate an increase of the assets </li></ul></ul><ul><ul><li>Customers not yet owning any B-type fund to help increase product ratio and attract new money </li></ul></ul>
  39. 39. Yapi Kredi-Define business objectives (contd. ) <ul><li>Communication channels: two-channel approach </li></ul><ul><li>Campaign sizing: Contact 3000 customers by branch based out-bound calls and active marketing during customer branch visits </li></ul><ul><li>Campaign: Two-step </li></ul><ul><ul><li>Customers were first contacted with the B-type mutual fund offer </li></ul></ul><ul><ul><li>Positive responders received a follow up call if they had not purchased until one week after their initial positive response </li></ul></ul><ul><li>Evaluation of results: Based on response and purchase rates by contact channel (branch or call center) </li></ul>
  40. 40. Yapi Kredi- Get Raw Data & Identify Relevant Variables <ul><li>Get Raw Data: </li></ul><ul><ul><li>Data mart with data extracted from more than 50 source system tables </li></ul></ul><ul><ul><li>About 20 database tables were produced with 30 Giga Bytes of disk space for the initial project phase </li></ul></ul><ul><li>Identify Relevant Variables - customer attributes describing: </li></ul><ul><ul><li>Demographics </li></ul></ul><ul><ul><li>Product Ownership </li></ul></ul><ul><ul><li>Product Usage </li></ul></ul><ul><ul><li>Channel usage </li></ul></ul><ul><ul><li>Assets </li></ul></ul><ul><ul><li>Liabilities </li></ul></ul><ul><ul><li>Profitability </li></ul></ul>
  41. 41. Yapi Kredi - Gain Customer Insight <ul><li>Based on six months of historical customer data, five different predictive models were developed </li></ul><ul><li>Best model: logistic regression </li></ul><ul><ul><li>Yielding a lift value of 29 and a cumulative response rate of 14 % for the top customer percentile </li></ul></ul><ul><ul><li>Reaches 2.9 times more responders for the top customer percentile than a random selection of the same size </li></ul></ul><ul><ul><li>A set of 4200 customers with the highest propensity to purchase was selected as the target group for the pilot campaign </li></ul></ul>
  42. 42. Yapi Kredi - Act <ul><li>A subset of 3000 customers was assigned to the 16 branches holding the responsibility for the respective relationships </li></ul><ul><li>The remaining 1200 customers were assigned to the call center </li></ul><ul><li>The target list with the corresponding channel assignment was made available to the campaign management system </li></ul>
  43. 43. Yapi Kredi - Result <ul><li>Result: </li></ul><ul><ul><li>Impressive response rates of 6.5% and 12.2% were obtained with the branch based part of the campaign and the call center based part of the campaign respectively </li></ul></ul><ul><ul><li>The pilot campaign acquired more than € 1 million into B-type mutual funds </li></ul></ul>
  44. 44. Summary <ul><li>Data Mining can assist in selecting the right target customers or in identifying previously unknown customers with similar behavior and needs </li></ul><ul><li>A good target list is likely to increase purchase rates, and have a positive impact on revenue </li></ul><ul><li>In the context of CRM, the individual customer is often the central object analyzed by means of data mining methods </li></ul><ul><li>A complete data mining process comprises assessing and specifying the business objectives, data sourcing, transformation and creation of analytical variables, and building analytical models using techniques such as logistic regression and neural networks, scoring customers and obtaining feedback from the field </li></ul><ul><li>Learning and refining the data mining process is the key to success </li></ul>