Analyze the campaign results and provide insights and recommendations on :
Which type of customers responded positively to the campaign ?
What can the customer be doing for better future campaign performance ?
How much can be the financial gains of the improved campaign strategies ?
Improving profitability of campaigns through data science
1. IMPROVING RETAIL SALES CAMPAIGN PROFITABILITY
FINAL CAPSTONE PROJECT : DATA ANALYSIS , FINDINGS AND RECOMMENDATIONS
SHOUNAK MONDAL
POSTGRADUATE DIPLOMA IN DATA SCIENCE, EMERITUS AND COLUMBIA UNIVERSITY
3. 1. The Customer is a B2B retailer for office supplies, office electronics and office furniture
2. A marketing campaign was executed to ~16k of its customers
3. Detailed campaign target data and results are available including key data per customer :
a) Resulting Sales
b) Historical sales
c) Type of previous purchase
d) Communication channel preferences
e) Size of target company
f) Language
Background
4. Analyze the campaign results and provide insights and recommendations on :
1. Which type of customers responded positively to the campaign ?
2. What can the customer be doing for better future campaign performance ?
3. How much can be the financial gains of the improved campaign strategies ?
Objective
5. Approach
Exploratory Data
Analysis
• Removed outliers
in Number of Year
Prior Transaction
• Drop 10 rows
which has null
values for No of
Employees and
type of previous
purchase
• Impute 442 rows
with Last
Transaction
channel as
Unknown
• Impute 4470 rows
with language as
unknown
• Remove negative
sales and
Historical sales
volume rows
Data Transformation
& Analsysis
• Transform
categorical values
to binary coded
values
• Observe
correlation
between features
and remove highly
corelated features
Model Building
• Build a robust
classification
model with most
influential features
to predict sales or
no sales
• Build a regression
model to predict
sales with the
most influential
features and
predict sales
• Using the
probability of
sales and the
amount of sales
predicted
calculate
profitability given,
gross margin , the
marketing and
transaction costs
Gains / Lift Chart
• Classify the
customers into
deciles and show
profits per decile
• Show profitable
deciles and
expected gains
over random
targeting
• Inner Join based
on customer ID, the
probabilities from
classifier and
predicted sales
sales from linear
regressor model
to formula to find
profits.
• Sort profits and
split by deciles
Recommendations
• Show Which type
of customers
responded
positively to the
campaign ?
• Show What can
XYZ Ltd be doing
for better future
campaign
performance ?
• Show How much
can be the
financial gains of
the improved
campaign
strategies ?
6. 1. Those who have purchased before ( particularly 16-22
Transactions in prior year )
2. Have had historical sales of up to $ 720,000+ ( 75% of the
purchases )
3. Made purchases in the year 1993 and 1994 – these 2 years
seems to have created long term loyal customers
Typical Purchasers are…
0
100
200
300
400
500
600
700
1926
1950
1954
1958
1962
1966
1970
1974
1978
1982
1986
1990
1994
1998
2002
2006
2010
2014
2018
Year of First Purchase
Coefficients–DegreeofInfluence
Numberof
purchaserecords
Year of Purchase
7. Classification Model was built and tested on ~8000 records test data. The model can be now
used to identify, predict and target positive sales candidates for future campaigns
To build a robust model, Random Forest Classifier was built by training it on ~8000 campaign data which yielded the following :
1. Prediction accuracy score : 85% - Ability of the classifier to make correct predictions
2. Precision score: 77% - The ability of the classifier to predict true sales accurately. Cost of low score is wasted marketing cost.
3. Recall score:65% - The ability of the classifier to find all the true sales. Cost of low score is missed revenue opportunities.
4. f1 Score 70% - balance between the precision and the recall
5. Confusion Matrix
NO YES
NO 5400 438
YES 788 1453
Predicted Sale ( Yes or No )
Actual
Sale
8. Now that we know targeting which customers will result in sales, next we built a model to predict
“Amount of Sales” for purchasing customers and tested it on ~4384 records positive sales test data
Linear Regression was used which yielded the following :
1. Linear Regression Fit score Training data : 77%
2. Linear Regression Fit score Test data : 75%
3. Root Mean Squared Error of prediction : 559
4. R squared score 0.75 ( degree to which the model
captures and explains the variance of the data )
Findings
1. Size of the company has the largest influence on the
sales amount : larger the company, larger the sales
amounts
2. Previous purchase of office furniture and computer
equipment has next significant influence in amount of
sales.
Coefficients–DegreeofInfluence
9. Gains Chart for ~1120 records test data representing a future campaign
Total Actual Profit per
customer of the campaign that was executed
( 16,000 records )
The lift chart is built from about 1120 records from test
data for linear regression and same 1120 records from
the classifier by using "inner" join on customer number
i.e common records between the two dataframes in the
linear reg test and classifier test
Deciles
Number of
customers per
decile
Actual Profitability
per customer
Lift over
average
Total Profit % of Profit
Incr Proj Profit
100k Customer base
Total Proj Profit
100k Customer
base
Cuml Incr Profit
100k Customer
base
Cuml Total
Profit
100k Customer
base
(1009.0, 1121.0] 112 504 497 55,630 76% 4,967 5,037 4,967 5,037
(897.0, 1009.0] 112 271 264 29,535 40% 2,637 2,707 7,604 7,744
(785.0, 897.0] 112 75 68 7,607 10% 679 749 8,283 8,493
(673.0, 785.0] 112 21 14 1,518 2% 135 205 8,419 8,699
(561.0, 673.0] 112 -1 -8 (947) -1% (85) (15) 8,334 8,684
(449.0, 561.0] 112 -12 -19 (2,113) -3% (189) (119) 8,145 8,565
(337.0, 449.0] 112 -20 -27 (2,994) -4% (267) (197) 7,878 8,368
(225.0, 337.0] 112 -28 -35 (3,959) -5% (354) (284) 7,525 8,085
(113.0, 225.0] 112 -37 -44 (4,944) -7% (441) (371) 7,083 7,713
(0.999, 113.0] 112 -49 -56 (6,268) -9% (560) (490) 6,523 7,223
Total 1120 7.00 73,063 100%
10. Recommendations
1. Instead of random targeting of customer base, use the prediction model to target only first 4 deciles type
customers for maximum profitability for future campaigns 1120 record test data.
2. Maximize profitability further by using lower cost channels that reach the above target of customers
effectively since marketing channels showed little or no influence on sales
3. Replicate what was done in the year 1993 and 1994, as it seems to have created long loyal customers
4. Use the model to predict sales, profitability, and expected Return on Investment and leverage it for a more
fact based budget requirements for decision by management / budget approver for next campaigns