2. Problem
Description
Task: Predict Whether a Donor’s Choose
Project will get Funded
Experience: Donor’s Choose Data from Sept
2002 - Currently
Performance: Classification Accuracy, the
Number of correct prediction out of all
predictions made.
5. Features Abbreviations Descriptions
total_price_excluding_optional_support Total Price of the Project
(integer)(dollars)
students_reached # of students that are project reaches
(integer)
school_type Types of School:
Charter, magnet, year_round, nlns, kipp,
Charter_ready_promise
(categorical)
date_posted Day that the project was posted
(categorical)
resource_type Type of Resources the project asks
(categorical)
grade_level The Grade Level of the Project
(categorical
poverty_level Poverty Level (categorial)
school_state From what state the project is posted
(categorical)
Eligible_double_your_impact_match
Whether it was eligible to be matched
(categorical
teacher_prefix The Prefix of the Teacher Posting
(categorical)
primary_focus_area The Project’s Primary Area of Focus
(categorical)
primary_focus_subject The Project’s Primary Subject of Focus
(categorical)
Original
Features
19. The 3 Models:
1. AdaBoost
2. Random
Forest
3. Logistic
Regression
20. GridSearch
Accuracy Scores
using F1 Score Metric
Model Accuracy Best Parameter
Random Forest 0.759 Criterion: Entropy
AdaBoost .7676 N_estimators: 60
Logistic Regression 0.811 Penalty: L2
Simplest Model with Best Score:
Logistic Regression
22. Using Only the 5 Most Significant
Features
1. Total_price_excluding_optional_support
2. Eligible_double_your_impact_match
3. Resource_Type_Books
4. Resource_Type_Technology
5. price_per_student
New Score with
Logistic Regression:
.8171
23. Overview
● Model Improvement of .1271 over the baseline using
Logistic Regression with F1 Score.
● Most of Predictive Power Lies in 5 Features
● Ethical Implications:
○ The features with the most predictive power are not
ones that can be changed without fabrication
24. Model Improvements
Add Prescriptive Data:
Project Essays
Project Materials
Use Data Based on Location:
Census
Skewed Data:
Find Reasons
Methods