Although the non-profit industry has advanced using CRMs and donor databases, it has not fully explored the data stored in those databases. Meanwhile, data scientists, in the for-profit industry, using sophisticated tools, have generated data-driven results and effective solutions for several challenges in their organizations. Regardless of your skill level, you can equip yourself and help your organization succeed with these data science techniques using R.
Data Science for Fundraising: Build Data-Driven Solutions Using R - Rodger Devine 2018
1. April 11, 2018
Data Science for Fundraising: Build Data-Driven Solutions Using R
Rodger Devine
April 11, 2018Meeting of the Minds 2018
2. Session Overview
Welcome
Data Science for Fundraising
Introduction to R and R Studio
Loading and Preparing Data with R
Building Solutions Using R – Analysis
Building Solutions Using R – Visualization
Building Solutions Using R – Model Building
Takeaways
2
3. Session Overview
3
• We will cover a lot of ground
• Solution details and additional support references
will be provided
5. Introduction
5
• Experience: 15+ years in higher education, software
development, enterprise IT delivery, financial systems
integration, data analytics, advancement operations,
program management, team building and executive
leadership
• Graduate Studies: search engine technology, data analysis,
applied statistics, organizational management, etc.
• Expertise: information retrieval, natural language
processing, statistical modeling, recommender systems,
business process design, strategic planning and consulting
15. 15
Data Science
• Converting data into actionable insights
• Identifying trends
• Recognizing patterns
• Visualizing complex data
• Making future predictions using past data
17. 17
Data Analytics Maturity
• Descriptive (Hindsight) - What happened? Why?
• Predictive (Insight) - What will happen?
• Prescriptive (Foresight) - What should we do?
23. Data Science for Fundraising
23
• Improve results (ROI, results, outcomes)
• Increase efficiencies (resources, time)
24. Donor Lifecycle
24
Identification
Who will you ask and what will
you ask for?
Qualification
Verify prospect capacity,
inclination, philanthropic priorities.
Cultivation
Building relationships, engaging
the prospect, and preparing to
solicit.
Solicitation
Making the ask.
Stewardship
Donor recognition and
engagement.
25. Donor Lifecycle + Machine Learning
25
Data-driven methods to sort, rank, and prioritize prospect
management workflow across the donor lifecycle
Identification
Qualification
CultivationSolicitation
Stewardship
30. Machine Learning Process
30
q Understand Your Goal
q Collect Data
q Prepare Data
q Build Your Model
q Evaluate Model Results
q Apply Insights
CRISP-DM:
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
39. Decision Support and Actionable Insights
39
Sort, Rank, Prioritize
• Identify high-value prospects using
machine learning
• Predict first-time gift, next gift or
lifetime giving
• Prioritize prospects for individual
solicitation via e-solicitation, direct
mail, leadership appeal or face-to-
face solicitation
40. Segmentation Opportunities
40
• Retention Identify donors most
likely to give again or at risk of non-
giving
• Upgrade: Identify donors likely to
give more
• Acquisition: Identify highly inclined
donors and non-donors
49. Machine Learning Example Data and Script
49
• Download “data.zip” to your desktop
o https://tinyurl.com/ycnd76qc
• Open “data.zip” on your desktop
• This will unpack data file and script to “data” folder
50. Running the Script
50
• Copy and paste lines of code into the console and
press ‘Return’ OR
• Select code and press Command + Return (Mac) to
manually run current line/code selection
• Select code and select Control + Return (Windows)
to manually run current line/code selection
51. Learning Mindset
51
• Learning is an iterative process.
• Seek to fail early and often.
• Be patient and positive as your learn.
• Read the documentation.
• When in doubt, Google to the rescue.
53. Install R Libraries
53
# Install Tidyverse library for data science
install.packages(“tidyverse”)
R Console Output
# Install caret for modeling
install.packages(“caret”)
R Code
54. Load R Libraries
54
# Load Tidyverse library for data science
library(“tidyverse”)
# Load caret for modeling
library(“caret”)
R Console Output
R Code
56. Loading Data
56
# Load readr from tidyverse library
library(“readr”)
# Load CSV file - Mac Users
data <- read_csv("~/Desktop/data/DonorSampleDataML.csv")
# Load CSV file - Windows Users
data <- read_csv(”C:/Users/username/Desktop/DonorSampleDataML.csv")
R Code
71. Change Data Types
71
# Convert Wealth Ratings to Factors
data$WEALTH_RATING <- with(data, factor(WEALTH_RATING,
levels = c('$1-$24,999', '$25,000-$49,999', '$50,000-$99,999', '$100,000-$249,999',
'$250,000-$499,999', '$500,000-$999,999', '$1,000,000-$2,499,999',
'$2,500,000-$4,999,999', '$5,000,000-$9,999,999', '$10,000,000-$24,999,999'),
ordered = TRUE))
A factor is a basic R data structure used for storing categorical data.
R Code
R Console Output
76. Build Solutions Using R – Analysis
76
# Calculate Average Donor Age
mean(data$AGE)
R Console Output
R Code
77. Build Solutions Using R – Analysis
77
# Calculate Average Giving by Donor
mean(data$TotalGiving)
R Console Output
R Code
78. Build Solutions Using R – Analysis
78
# Calculate Average Giving by Donor and Round to Nearest Dollar
round( mean(data$TotalGiving) )
R Console Output
R Code
79. Build Solutions Using R – Analysis
79
# Calculate Average Giving by Donor and Store into Variable
donor_avg_giving <- round(mean(data$TotalGiving))
R Console Output
R Code
# Print Average Giving by Donor
donor_avg_giving
80. Build Solutions Using R – Analysis
80
# Calculate Median Giving by Donor
median(data$TotalGiving)
R Console Output
R Code
89. Define a Business Problem
89
Goal = Identify prospects most likely to give based on
available past data
90. 90
Goal = Identify prospects most likely to give based on
available past data
Outcome variable = Dependent Variable
Input variables = Independent Variables
Translate Problem into Model Requirements
91. 91
• Outcome variable = Donor or Non-donor
• Input variables = Age, Marital Status, Degree Level, etc.
Translate Problem into Model Requirements
92. Selecting Models and Features
92
• Logistic regression is one suitable model choice for
predicting a categorical output (”Yes or No”, ”Donor or
Non-Donor”, etc.)
• Binary logistic regression predicts a dependent variable
with two categorical outcomes
• Note: Models and feature (predictors/variables)
selection often requires extensive exploration, analysis,
domain knowledge and critical thinking
• Today, we are building a quick baseline model to help
you get started on your data science journey!
93. Load R Model Libraries and Data
93
# Load readr
library(readr)
# Load dplyr
library(dplyr)
# Load caret
library(caret)
# Load Data
data <- read_csv("~/Desktop/data/DonorSampleDataML.csv")
R Code
94. Inspect Data Set
94
# Drop 'ID' variable
data <- select(data, -ID)
# Table of donor vs. non-donor count
table(data$DONOR_IND)
R Console Output
R Code
95. Select Model Variables
95
# Drop 'ID', 'MEMBERSHIP_ID', etc.
pred_vars <- c('MARITAL_STATUS', 'GENDER',
'ALUMNUS_IND’, 'PARENT_IND’, 'HAS_INVOLVEMENT_IND',
'DEGREE_LEVEL’, 'PREF_ADDRESS_TYPE',
'EMAIL_PRESENT_IND', 'DONOR_IND’)
# Store Model Variables
data <- select(data, pred_vars, AGE)
R Code
96. Convert Model Variable Types
96
# Convert features to factor
data <- mutate_at(data,
.vars = pred_vars,
.funs = as.factor)
# Set Seed for Repeatable Results
set.seed(123)
R Code
97. Build Training and Test Data Sets
97
# Create Training and Test Data Set Indices
dd_index <- sample(2, nrow(data),
replace = TRUE,
prob = c(0.7, 0.3))
# Build Training Data Set
dd_trainset <- data[dd_index == 1, ]
# Build Test Data Set
dd_testset <- data[dd_index == 2, ]
R Code
98. Build a Model
98
# Build Model
giving_lr_model <- glm(DONOR_IND ~ ., data = dd_trainset,
family = "binomial")
# Store Predictions
predictions <- predict(giving_lr_model,
newdata = dd_testset, type = "response")
# Convert Predictions to Y or N
preds <- as.factor(ifelse(predictions > 0.5, "Y", "N"))
R Code
Outcome Variable Input Variables
“.” means “all”
Y or N
116. Takeaways
116
• The purpose of analytics is to add value.
• Lead with business purpose to guide solutions.
• Start where you are right now.
• Solicit feedback often.
117. Takeaways
117
• Fail early and often to drive improvement with real users.
• Get involved with the MOTM community.
• Build a community of support and expertise within and
across your organization.
• Be a learn-it-all, not a know-it-all.
123. 123
Data Science for Fundraising: Build Data-Driven Solutions Using R
http://nandeshwar.info/ds4fundraising/
Quick-R Built-in Functions
https://www.statmethods.net/management/functions.html