BRIDGEi2i Whitepaper - The Science of Customer Experience Management
Mevsys Data Mining: one product per customer.
1. MEVSYS DATA MINING
ONE PRODUCT PER CUSTOMER
Brief outline of a project in
deployment since 2008.
2013 1
TO PROTECT CUSTOMER CONFIDENTIALITY SOME REFERENCES HAVE BEEN OMMITED AND/OR GENERALIZED
PERCENTAGES AND RESULTS ARE KEPT UNTOUCHED
2. BUSINESS UNDERSTANDING
The company defined that when a client contacts customer service,
sometimes an opportunity is presented to offer him a product.
Having seven different product types, the question was which
product was the ideal one for each client.
CRM expressed interest in offering the product closest to each client’s
characteristics, avoiding proposals of little interest to him and building
a business relationship based on his needs.
The ideal product was then the one the client was closest to buying
on his own, without considering any direct marketing stimuli.
2013
TO PROTECT CUSTOMER CONFIDENTIALITY SOME REFERENCES HAVE BEEN OMMITED AND/OR GENERALIZED
PERCENTAGES AND RESULTS ARE KEPT UNTOUCHED
2
3. DATA UNDERSTANDING
From the client database, general, demographic, products and
transactional data history was extracted. Predictive and outcome
variables were obtained from this source.
From the campaigns database direct marketing sales actions history
was extracted. According to the pursued objective, this information
was used to exclude clients that during the analized period had some
buying stimuli by the company.
Data is available summarized for each month, avaialable around the
15th of the next, therefore predictive models are deployed with a
one month window.
Example: with January data, obtained mid February, predictions to
what will happen during March are deployed.
2013
TO PROTECT CUSTOMER CONFIDENTIALITY SOME REFERENCES HAVE BEEN OMMITED AND/OR GENERALIZED
PERCENTAGES AND RESULTS ARE KEPT UNTOUCHED
3
4. DATA PREPARATION
Clients tables crossing with a 5 month margin: 1st, 2nd and 3rd for
predictive variables history, 4th as window (not used) and 5th to obtain
variables to predict.
Campaigns tables crossing to exclude clients that received direct buying
stimuli during that period.
Years of periods of history were obtained.
The outcome variable was generated, according to which product
variable registered an increment between the 3rd and 5th months.
New predictive variables were generated, summarizing and calculating
new information. From an initial 180 variables 350 were obtained: the
final model uses about 30.
The standard necessary transformations were made, according to the
needs of each algorithm to be tested.
2013
TO PROTECT CUSTOMER CONFIDENTIALITY SOME REFERENCES HAVE BEEN OMMITED AND/OR GENERALIZED
PERCENTAGES AND RESULTS ARE KEPT UNTOUCHED
4
5. MODELLING
With genetic evolution and brute force algorithms, thousands of
possible models were tested, based on 6 basic types: support vector
machines, neural networks, decision trees, logistic regression,
discriminant analysis and naive bayes.
2013
TO PROTECT CUSTOMER CONFIDENTIALITY SOME REFERENCES HAVE BEEN OMMITED AND/OR GENERALIZED
PERCENTAGES AND RESULTS ARE KEPT UNTOUCHED
5
# Generated by Log[com.rapidminer.datatable.SimpleDataTable]
# Leaf size Gain Depth Confidence Number of Prepuning Performance
765 0,2307 40 0,2728 25 29%
30 0,0841 14 0,0387 13 36%
485 0,2810 18 0,4842 28 29%
63 0,0566 69 0,1203 20 41%
624 0,0662 86 0,2823 3 38%
765 0,2368 69 0,2611 25 29%
30 0,0877 14 0,0140 13 37%
63 0,0704 40 0,1142 20 39%
6. EVALUATION
The following is a simplified confusion matrix of the final evaluation, in addition
to the regular statistical validations carried on during modelling.
It describes the model’s performance according to the product predicted versus
the product actually bought.
With a 51% performance, for half of the clients the exact product is predicted.
2013
TO PROTECT CUSTOMER CONFIDENTIALITY SOME REFERENCES HAVE BEEN OMMITED AND/OR GENERALIZED
PERCENTAGES AND RESULTS ARE KEPT UNTOUCHED
6
P R E D I C T E D
PROD1 PROD2 PROD2 PROD4 PROD5 PROD6 PROD7
B
O
U
G
H
T
PROD1 6.964 838 846 797 300 1.963 582
PROD2 583 3.217 298 161 56 177 127
PROD3 1.111 245 1.142 182 22 605 176
PROD4 707 214 165 2.169 74 220 59
PROD5 84 131 26 70 297 83 33
PROD6 2.491 346 585 459 304 2.469 388
PROD7 185 83 159 26 56 222 820
Buyers: 33.317 / Correct Predictions: 17.078 / Performance: 51%
7. EVALUATION II: VS NO DATA MINING
In the customer service operator’s system, the ideal product is
indicated for the client he’s currently serving.
If the operator were to offer every client the most frequent product
bought, precision would be 37%. Against this scheme, the model’s
51% increases performance by 38% and also allows a varied
offer according to each individual client.
If the operator were to use a varied offer proportional to the usual
sales distribution, precision would be just 23%. Against this scheme, the
model increases performance by 122%.
2013
TO PROTECT CUSTOMER CONFIDENTIALITY SOME REFERENCES HAVE BEEN OMMITED AND/OR GENERALIZED
PERCENTAGES AND RESULTS ARE KEPT UNTOUCHED
7