1. Predictive Analytics 101:
An overview of how to create a dataset and
model to identify students at risk of attrition
Karen DeSantis
Senior Analyst
Office of Planning, Assessment and Institutional Research
Pace University
Pace’s Inaugural Retention Conference
June 16, 2017
2. Data Types and Sources
• Demographic
• Economic
• High school specific
• Pace specific
• Dates and deadlines
• Census
• Applications (Pace University and Financial Aid)
• Orientation
– BCSSE (Beginning College Survey of Student Engagement)
– Placement tests
• Historical data
3. Variables
• Demographic
– Gender, Age, Race, International, Underrepresented Minority
• Economic
– Financial Aid package, Tuition, Unmet need, Grants
• High school specific
– GPA, test scores (SAT, ACT, etc.)
– BCSSE responses, Placement data (from Orientation)
• Pace specific
– School, Campus, Residence, Major, CAP or Honors, Legacy, Athlete
• Dates and commitment
– Deposit Date, Attended orientation
• End of Semester Data:
– Starfish, Event attendance, End of semester GPA
4. Models
• Identified Dependent variable: Prediction of which students
will leave the University
– One semester (Fall to Spring semesters) – only a small percentage leave
– One year (Fall to Fall semesters) – up to 25% leave
• Gathered historical data for 2013, 2014, and 2015 First Year,
Full Time class cohorts
• Gathered data for the 2016 First Year, Full time cohort
• Data cleaning takes more time than you expect
– Variables may be missing
– Some students did not take BCSSE, SATs or complete FAFSA forms
– Recoding of variables into binary variables (0,1)
– Computing variables to be on a scale rather than absolute
values such as financial aid
5. Model – Variable selection
• Which variables correlated with the Dependent variable for
the historical data?
o SAT scores
o High School GPA
o Placement scores
o Undecided majors
6. Analysis
• Binary Logistic analysis
– Binary selected because there are two outcomes: Return or Attrite
• Statistical package selected affects analysis
– SPSS requires all variables to have a value to include a case (student) in the
analysis
• If a case has one variable empty, it will not be included in the SPSS analysis
– Created a binary “Dataset” variable so the analysis was run on the complete
dataset with an Attrition variable (students from 2013 to 2015) and used the
variables for the 2016 students without an Attrition value
• Saved Predicted values
– Analysis provided a predicted value for all students in the model
• Compared predicted values for each of the 2013 to 2015
cohorts to see how well the model fit with the students who
already left
7. Lists of Students
• Students with the highest predicted value for attrition were
identified for the 2016 cohort
• List of top 500 students was isolated and shared with the
Division of Student Success
• Using financial aid variables as well as the predicted attrition
variable, identified students who had highest financial need
within the 2016 cohort
• List of top 500 students with highest financial need shared
with Financial aid
8. Assessment of Model
• Identify 2016 cohort students who attrite from Fall to Spring
• Assessed identified students predicted scores from the two
models
• Identifying top predicted students in each cohort year and
comparing attrition rates for the two models
• Comparing top predicted students from 2016 to the top
predicted students attrition rates for the previous years
• Future: After Fall 2017 census, compare attrition of 2016
students who were contacted with attrition of the whole
class.
9. Outreach Feedback
• Feedback from DSS and Financial Aid
– How many students were actually contacted?
• What were their difficulties contacting some students?
– Comments and suggestions by those who performed the outreach
• Were students already on advisors/counselors radar?
– How outreach was performed and by whom
– What outcomes happened after DSS outreach?
– Did FA outreach result in additional financial aid awards for the
following year?
10. Next steps
• Remove 2013 data from analysis
– BCSSE data is more complete beginning with the 2014 cohort when it was
included in orientation
• Plans for Fall 2017 cohort
11. Additional Ideas
• What new variables can we add to the model?
– Grades from Math Courses or first Course in major
– Blackboard engagement
Concerns?
Suggestions?
Questions?