This paper aims to build a binary classification model to help non-profit organizations efficiently target likely donors for direct mail campaigns. The authors use a dataset of over 1 million records containing demographic and campaign attributes to select relevant features and split the data into training and test sets. Several classification algorithms are tested on the data, with a neural network found to have the lowest false positive error rate, which is important to minimize costs. The authors further tune the neural network structure and regularization to optimize performance, and select a classification threshold that balances errors to maximize estimated net returns.
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
Accurate Campaign Targeting Using Classification Algorithms
1. CS 229 Project Final Paper
1
Accurate Campaign Targeting Using
Classification Algorithms
Jieming Wei Sharon Zhang
Introduction Many organizations prospect for loyal supporters and donors by sending direct mail appeals. This is an effective way to build a large base, but can be very expensive and have a low efficiency. Eliminating likely non-donors is the key to running an efficient Prospecting program and ultimately to pursuing the mission of the organization, especially when there’s limited funding (which is common in most campaigns).
The raw dataset, with 16 mixed (categorical and numeric) features, is from Kaggle website – Raising Money to Fund an organizational Mission. We built a binary classification model to help organizations to build their donor bases in the most efficient way possible, by applying machine learning techniques, including Multinomial Logistic Regression, Support Vector Machines, Linear Discriminant Analysis. We aimed to help them separate the likely donors from the non-likely donors and target at the most receptive donors by sending direct mail campaigns to the likely donors.
Table 1: Feature analysis Data Preparation: feature selection, training set and test set selection
The original dataset is 13-dimensional with 1048575 observations on demographic features of prospects and campaign attributes.
Demographic: zip code, mail code, company type.
Prospect attributes: income level, job, group.
Campaign attributes: campaign causes, campaign phases.
Result : success/failure
In order to serve the purpose of this study, we adapted the dataset, reducing it to a 10- dimensional dataset to keep the relevant features and selected training set and test set:
Feature selection
We selected features based on their relevance to the result of the campaign, estimated by the correlation of a given feature to the result. We selected 7 features with relatively large correlation with the result. We found that donor income level is the most correlated variable.
company type
project type
cause1
cause2
phase
zip
donor income
result
-0.016
-0.0008
-0.0765
-0.0416
0.0149
0.071
0.248
1
Original dataset: 13-dimensional with 1048575 observations Features: Project ID, Mail ID, Mail Code ID, Prospect ID, Zip, Zip4, Vector major, Vector Minor, Package ID, Phase, Database ID, JobID, Group, ID
2. CS 229 Project Final Paper
2
Income Histogram
natural log of income
Frequency
10 11 12 13 14 15 16 17
0 10 20 30 40 50 60
Phase Histogram
phase
Frequency
0 5 10 15 20
0 20 40 60 80 100
Processed dataset:
7 dimensional. Training set: 300
observations. Test set: 100 observations.
Features:
Company type (Campaign
Organization Type),
Project type,
Cause 1, (indicating whether the
campaign is for profit or non-profit )
Cause 2, (further classifying causes
into policy, tax, PAC, CNP,
congress, MEM.)
Phase,
Zip,
Candidate Income,
*result. The distribution of these
variables are displayed below.
Where *result is a binary value indicating
good/bad prospects. It is where we’ll base
our training and gauge the prediction.
Distribution visualization
We generated histogram to understand the
distributions of the features. As candidate
income, the most relevant feature, has large
variance, we performed log transformation
to visualize its distribution, as well as its
Project Type Histogram
project type
Frequency
0 2 4 6 8 10
0 10 30 50
Cause1 Histogram
candidate/non-profit
Frequency
1.0 1.2 1.4 1.6 1.8 2.0
0 50 100 150 200
Cause2 Histogram
causes
Frequency
2 4 6 8 10
0 20 40 60 80 100
3. CS 229 Project Final Paper
3
Company Type Histogram
company type
Frequency
1 2 3 4 5 6
0 20 40 60 80 100
distribution with respect to result.
Training set and Test set Selection
We created a training set by randomly
selected 300 observations from the original
dataset, and 100 observations as the test set.
And replaced categorical variables with
numbers.
Cause 1: 1, 2 are replaced for
candidate/non-profit.
Cause 2: 1-11 are replaced for
congress, CNP, tax, policy,
DEF, senate, PAC, MIL, MEM,
IMM, REL.
Methods
We tried the following approaches to
classify mailings into two classes, namely
donors who are likely to donate and those
who are not: Multivariate Regression, SVM,
Decision Tree, Neural Network, and
Random Forest. Training errors , test errors,
false positive error rates, false negative error
rates are calculated for each model.
Trainin
g error
Overa
ll test
error
False
positiv
e
False
negativ
e
MVR 0 0.31 0.25 0.06
SVM 0.323 0.54 0.18 0.36
Neural
Nets
-- 0.378 0.174 0.204
Rando
m
Forest
0.003 0.47 0.23 0.24
We noticed that test errors for most models
are relatively high, with MVR and Random
Forest models having high variance
problems, and SVM having high bias
problems. False positive errors are high for
most models, with Neural Net model having
the least false positive error. We want to
minimize false positive error so as to save
costs for campaign companies.
Algorithm Selection
From the previous initial fitting with
different algorithms, we found that the
dataset is a little messy and irregular in
terms of patterns – there are some random
processes in which a prospect would end up
donating money or not. It is a character of
this dataset, while it is not uncommon in real
world datasets.
When datasets have this degree of
randomness, it is hard to lower the overall
error ratio, however, we can still distinguish
false positive and false negative, and
specifically lower the error type that yields
better profit, saves more efforts, etc.
10 11 12 13 14 15 16
0 2 4 6 8 10
natural log of income
result
4. CS 229 Project Final Paper
4
whatever makes more sense. In this case, we
want to lower false positive error. It is
because when a prospect is mistakenly
considered “will donate”, then time and
money resources would be spent and
wasted.
We choose to use Neural Network for this
experiment, since it has the lowest false
positive rate. There are three steps – first
choose the number of neural nets and the
threshold for the structural model, then
choose the optimal regularization parameter
under this optimized structural model, and
lastly choose the threshold for false positive
minimization.
1) Pick the number of neural nets
and threshold for the structural
model
We did a looping experiment with 10 nets,
with 20 threshold values ranging from -0.2
to 0.2. For each value pair we calculate an
overall estimation of misclassification, and
get a 20 × 10 matrix. Applying averages to
rows and columns, we choose the lowest
average in both column and row to be the
number of nets and threshold value. The
result of applying average is as below:
Thus, we use -0.14 as threshold and 8 layers
of hidden nets for our structural model.
2) Determine the optimal
regularization under the
structural model chosen
This step we choose the optimal
regularization (weight decay for parameters
0, 0.1, … , 1) for the structural model found
above by averaging our estimators of the
overall misclassification error on the test set.
We calculate the average with over 10 runs
with different starting values. From this
tuning process we found that decay
parameter 0.5 yields the lowest overall
misclassification.
2 4 6 8 10
44 46 48 50
misclassification errors with different nets
nets
percentage of misclassification error
nets selected
-0.2 -0.1 0.0 0.1
40 45 50 55 60
misclassification errors with different thresholds
thresholds
percentage of misclassification error
threshold selected
0.0 0.2 0.4 0.6 0.8 1.0
38 40 42 44
misclassification errors with different decay level
decay
percentage of misclassification error
decay level selected
5. CS 229 Project Final Paper
5
-60
-40
-20
0
20
40
60
80
100
Net Collected $ with Different Threshold
Values
3) Determine threshold to minimize false positive misclassifications
A way to control the false positive misclassified donors to be less is to choose a threshold of being a donor or non-donor. When there is a higher “bar” for classifying
prospects to be a donor, we filter the prospects with most certainty of being a donor.
We tried 16 different threshold values and drew the result chart with four different categories – positive, false negative, false positive, negative. When threshold is -0.3, every prospect is classified as non-donators, thus the false positive error is 0. When the threshold increases, false positive errors increase while false negative error decrease.
The question now is to find a balanced point of two types of errors so that the overall campaign is cost effective. We use the following formula:
푵풆풕 푪풐풍풍풆풄풕풆풅 ($)=푫풐풏풂풕풊풐풏 ($)∗푫풐풏풐풓풔 (푴)−푷풂풄풌풂품풆 푪풐풔풕 ($)∗푷풂풄풌풂품풆풔 (푵)
Where Donation ($) and Package Cost($) are assumptions, Donors (M) corresponds to the number of positive occurrences, and Packages Sent (N) corresponds to the sum of positive and false-positive occurrences since the model predicts those prospects will donate. When we use the assumptions Donation = $50 and Package Cost = $20, we observe the following Net Collected amount trend in chart on the right.
Thus, the most cost-effective threshold is r=-0.1, with false-positive rate 18%.
0
2
7
10
10
11
13
14
17
17
18
19
21
22
23
23
0
10
20
30
40
50
60
70
Classification Results (%) with Different Threshold Values
positive
false-negative
false-positive
negative