Coupling Traditional Strategy Thinking and Advanced Analytics
1. Ian Xiao | ian.xxiao@gmail.com | LinkedIn
How can a Premium Environmental Non-Profit
Achieve 2x Donation in 5 Years by 2020?
A Rapid Assessment in 8 Hours by Coupling Traditional Strategy
Thinking and Advanced Analytics in R
2. Background and Summary
What is this project about?
What are my objectives and approach?
This assessment was an outputof a 8-hour datathonhosted by
Data for Good, which is a Canadian data scientist community.
Our mission is to help high impact Not for Profits (NFP) to tackle
business challenges by offering pro-bonoanalytics services.
On April 30th, 2016, we helped a global environmental NFP, BlueCo
(renamed for confidentiality), to identify options to achievetwo
objectives:
• Double donationin 5 years by 2020
• Improve data management to support future analytics
What are my learnings after this Datathon?
In my experience, many teams have a tendency to run complicated
machine learning models and produce fancy interactive
visualization at the get-go (probably dueto the recent hype).
By falling into this trap, teams can easily lose sight on the key
business questions.In addition, teams oftenproduce insights that
are hard to explain and be tied back to theobjective.
My goal was to develop 3 preliminary actionable
recommendations in 8 hours. To do so, I tried to resist the
temptation of jumping into any machine learning, butto rely on
traditional strategy approach.
By taking this approach, I forced myself to break down the
problem and develop hypothesis logically, and prioritize key
focus areas. Then, I appliedbasic or advanced data and
analytics techniques to accelerate my analysis in order to shape
recommendations.
Instead of summarizing my recommendation to BlueCo, I’d like to share
a few ideas after working on this project.
Here are my thoughts:
1. Many business problems can (still) be guided with logical thinking
and solved with simple analysis based on counts, averaging,
grouping, etc.
2. Advanced analytics (data processing, machinelearning, and
interactive visualization) can help consultants / business analysts to
extract insight faster and enhancecommunication whenuse
correctly. Such techniques and tools can be effective, especially
when the dynamics of the problem is becoming more complicated;
when it’s wrongly implemented, analytics can bea black hole for
valuable time
3. In addition to investing into analytics talents andsolutions,
organizations should continueto invest in data management,
especially data collection systems and processes; this tends to be
the root causes for many bad data issues,which pollutes the
performance of analytics
What will you find in this presentation?
Preliminary Action Plans Analysis Visualization Pipeline Design
Note: this is not the final recommendation due to time limit; insights and
action plan can change based on future refinement of analysis and scripts
3. Quantifying the Long Term and Immediate Donation Goals
$0
$50,000
$100,000
$150,000
$200,000
$250,000
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Annual Donation
BlueCo Needs to Achieve a 14.8% YoY Growth in order to Reach 2x Donation Target in 5 Years (by 2020)
Historical Annual Donation
Future Annual Donation Target
Note: annual attrition is excluded in calculation, data not available; assumed 100% retention over the years;
Reversed donation amount is excluded
Scope of Analysis
based on Immediate
Goal
Obtain CND$16,430 net new donation in 2016 assuming 100% retention from 2015
Immediate 2016 Target
2020 Donation Target
4. Preliminary Donation Analysis Framework
Identifying specific questions by breaking down the
problem logically …
How to increase
Donation $ by $16K
in 1 year?
How to increase
total $ from
existing donors?
How to obtain
more $ from new
donors?
How to increase $
per donation?
How to increase #
of donation per
year per donor?
How to increase #
of existing donor?
How to increase #
of new leads?
How to increase
conversion % to
one-time donor?
How to increase $
per donation?
How to conversion
% to recurring
donation?
How to increase $
per recurring
donation?
Note: Donation lost because of donor attrition was not taken into account due to limited time
Legend: In Scope for Datathon Future Consideration due to time limit
Identify opportunity by exploring transaction pattern with basic
quantitative analysis
Understand and utilize key factors that impacts donation frequency through
discovery using machine learning techniques (let the machine do all the work
and tell us what factors are important)
… helps to design and focus the scope of
analysis, especially in a short period of time
1
2
5. Data Source Overview
Donor and
Campaigns
Donation
Transaction
Facebook Data
• ~26K donor info entered
between 2014-2015
• Demographic info
• Campaign response info
• ~22K transactions between
2011-2015
• Transaction Details (i.e. amount,
date, location, donor info, etc.)
• Donor transaction history (i.e.
cumulative transaction, gift,
campaign, etc.)
• ~75k Facebook posts
• JSON file including detail post
info (i.e. like, posts, sub-posts,
etc.)
6. Preliminary Action #1: Capture Reversed Donation
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
2011 2012 2013 2014 2015 2016 Incremental Goal
Capturing Reversed Donation can Help Green Peace to Achieve 61%
of the 2016 Target
Average Reversed Donation
Amount from 2013-2015
Reversed Donation by Year
Follow Up Questions
• What causes reversed donation? (i.e.paymenttechnical error, process error, etc.)
• How much will it cost to capture reverse donation depends on the cause?
2016 Goal ($16K Net New
Donation)
1
8. Next Steps
1. Continue to understand the cause of reversed donation
2. Further analyze relationship between donation amountand number ofrecurring donation
3. Further analyze specific campaigns,such as 14CFW and 13CF, to identify techniques other
events can leverage to improve recurring donation
4. Analyze campaign 13CK to understand how to achieve high one-time donation
5. Quantify the net new donation contribution to 2016 target for each option identified
6. Identify options and cost to improve data collection to supportfuture donor and marketing
analytics.Some data issues identified are:
• ~97% of the birth dates is blank,which limits the information on donor demographic
to support key marketing analytics and initiatives
• Most customer data were entered in March 2014;this limits the ability to perform
cohort analysis,which will potentially hinder the analytics performance thatBlueCo.
is aiming to develop
2
3
4
5
6
1
10. Appendix A – Density Plot of Average Donation Amount
11. Appendix B – Donation Frequency and Average Amount by Campaign
$0
$10
$20
$30
$40
$50
$60
$70
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
11CK 12CK 13CF 13CK 14CFW 14CK 14CKW
Some campaigns drive recurring donations while others attract higher one-
time donation amount; 14CFW performs relatively well in both
more than 8 donation in 1 year less than 8 donation in 1 year average one-time donation amount
12. Appendix C – Donation Frequency by Gender
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Female Male Unknown
Male donate about 3% points more than male donors; the effect is
marginal, hence gender targeting can deprioritize in marketing
more than 8 donations in 1 year less than 8 donation in 1 year
13. Appendix D – Data Processing and Model Pipeline and Tools
Insight on data
cleaning and
feature engineering
Automatic
data
exploration
script
Campaign
leads table
Transaction
table
Automatic
generic data
cleansing
script
Clean leads
table
Clean
Transaction
table
Feature
Engineering +
Aggregation +
Table join
Joined
table with
extra
features
XGB Model +
Feature
Importance
Analysis
Excel with
ranked
features
importance
Aggregation +
Visualization
with ggplot2
or Excel
Ideas on data
visualization
Data Exploration Data Processing + Feature Engineering Modelling and Visualization
Key business questions