2. INDEX
Items
Slide No
Title
1
Problem Definition
3
Solution
4-11
Result
12-17
Challenges
18
Value
19
Roadmap and Case Study
20
3. DEFINING THE PROBLEM
Assumption:
Every year fresh 1,25,000 students enroll for various offering with a avg course fee of Rs. 40,000.
Assuming 15% are dropped-out, loss of approx Rs 75 cr;
Cost of acquisition and cost of retention should also be added
Once students drop-out, we react to it because we did not predict the dropout;
Problem: We are unable to predict the chances of dropping out of a student at any stage Student Life Cycle. Hence we are reactive , but can we become PRO-ACTIVE. 2 thoughts
1.Prevention is better than Cure.
2.Disaster management; at least be ready.
xxxx
xxxx
x
4. APPROACH TO SOLUTION - ANALYTICS
Large stores of data already exist at University By analyzing this data university can harness the power of analytics
▪To Provide
▪Predictive view of upcoming challenges for the institution and for students
▪Information both at the course level and the programmatic level
▪Identify students at risk
▪To improve
▪Enrollment management
▪Student progress
▪Institutional finance and budgeting
▪Student achievement
▪Retention
▪Institutional accountability
▪To Develop
▪Student recruitment policies
•Adjust course catalog offerings
•Determine hiring needs
•Make financial decisions
To Support
•Optimal use of economic resources
•Pedagogical resources
•Offering a structure for improved educational outcomes.
To Understand
•Student behaviour online(through LMS Usage)
•Cost to complete a degree
5. Extraction of data from one or more systems(SAP, SIS)
Stored data is analyzed using statistical software, and a mathematical model is generated
With significant variables and using statistical techniques as logistic regression, decision trees, and neural networks, we are able to developed a single refined retention mode
In Other Words:
The premise behind RM is fairly simple: utilize the wealth of data found at an university to determine in real time which students might be at risk through analytics, mining and statistical techniques. The goal is to produce ―Actionable Intelligence.
A predictive student success algorithm (SSA) is run and RM works by mining data from multiple sources and subsequently transforming the data into a generated risk level with supporting information for each student.
The algorithm that predicts students’ risk statuses has two components:
1. performance, measured by grades earned in course to date.
2. student demographics such as age, gender, employment etc.
Each component is weighted and pulled into the proprietary algorithm, which then calculates a result for each student. Based on results of the SSA, the students are classified into buckets
Based on the results of student at risk, Academic Alert Report (ARR) is shared with LC/counsellor and a particular action may be triggered, such as sending the student an electronic notification or initiating a personal intervention
RETENTION MODEL
6
5
4
1
2
3
7
6. RETENTION MODEL
To predict possible dropouts,
our model relies on:
Our Model
Gradebook
LMS Usage
Past academic
Demo- graphics
Data Available
Data currently Not Available
7. OUR PROCESS
Used data set of 95K currently enrolled students, manually classified 17K as probable dropouts
Employed Machine Learning (decision trees, neural networks) to train a model using partial data set
Model was tested for accuracy and can be used to predict drops from university data sets
However, now, using current list of 17K probable dropouts to define an intervention process (targets and communication)
8. CREATED BROAD FOUR BUCKETS TO CLASSIFY STUDENTS ARE:
Will Continue
Dropout Low
Dropout
Dropout High
11. CLASSIFICATION RESULTS
Class: Continuing (Blue), Dropout (Red)
For each input parameter, following graphs show the dropout/continuing breakup, for 82K of the 95K student records
Students can apply to MBA in 1st, 2nd, 3rd Sem
18.6% of training records are Dropouts
Dropouts rates are highest in the 1st sem, less for students in their 2nd sem, and least for those in 3rd
68270
13703
27
Applied semester
66777
15223
Class
20917
29206
31877
Current Sem
12. CLASSIFICATION ON DEMOGRAPHICS (CONTD.)
Enrollment rates are higher for Non-employed, but dropout rates are higher for employed
No geographic trends seem apparent
Male enrollment and dropout rate is higher
Dropouts are least in 20-23 years age group
71343
10653
4
Employment
State
Gender
61313
20687
Age
13. CLASSIFICATION ON ASSESSMENT (CONTD.)
Sem-1 with 6 backlogs have high dropout rates
Sem-1 ‘E’ grade clearly has high dropout rates
Sem-2 with 6 backlogs have high dropout rates, 0 backlogs include students in Sem-1
Sem-2 with ‘E’ grade has high dropout rate, or Incomplete (mostly those in Sem-1)
Sem 1 - Backlog
Sem 1 - Grade
7570
11962
12155
1514
38
48761
Sem 2 - Backlog
0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0
Sem 2 - Grade
3444
6078
6949
923
33501
31105
14. CLASSIFICATION ALGORITHM: NEURAL NETWORKS
Neural Network
true CONTINUING
true DROPOUT
pred. CONTINUING
18695
2603 (false –ves)
pred. DROPOUT
2376 (false +ves)
2193
ANN Model Testing Results
89% accuracy
45-50% dropouts predicted correctly
50-55% false -ves/+ves
1/2 intervention will be targeted to appropriate candidates
Artificial Neural Network (ANN) Model Training:
Training Data set reduced (~80%)
Trained model can be stored, retrieved and used for predictions
15. APPLICABILITY OF MODEL
▪The current trained model has been trained for distance learning students and is meant to determine students showing high tendency/probability of dropping-out
▪The model can be used, in its current form (and accuracy), on other courses of data sets for predicting each students’ dropout tendency, and planning a timely/pro-active intervention process accordingly
▪More data (e.g. LMS usage records) should increase model accuracy
Input-1 Age
Input-2 Exp
Input-3
Income
Input-4
Location
Input-5 Gap in Acad
Input-6 Job/Biz
Input-7
Marital St
Output-1 DO in 2nd 30%
Output-2 DO in 3rd 20%
Output-3 DO in 4th 40%
Output-4
DO in 5th 10%
Output-5
DO in 6th 30%
16. SUGGESTED ACTION: TARGETED INTERVENTION
~17K students from current MBA pool of 95K students might drop; dropouts skewed toward 1st sem registrants => ~notional loss of Rs ~26.3K revenue/dropout; total revenue loss from MBA alone: Rs ~45Cr
Timely/Pro-active intervention for all distance learning students should reduce notional losses
Targeted intervention can be carried over 2 channels:
Learning Centre -driven f2f counseling
University driven personalized email, telephonic counseling
17. CHALLENGES
▪Data mining and predictive modeling are affected by input data of diverse quality
▪A predictive model is usually as good as its training data
▪So getting the data is a challenge
▪Good: lots of data
▪Not so good: Data Quality Issues
▪LMS usage (missing data)
▪Whether the management is open for new approach and ideas
18. VALUE - REVENUE PROJECTION
Revenue Item
Amount
Notional Rev Loss (17K students, could have paid Rs 26.3K in tuition each - skewed towards 1st sem dropouts)
Rs 45 Cr
Conservative Conversion Rate
30%
Incremental Revenue
13.5Cr
After successfully identifying and applying the intervention process for the first set of Data – MBA Students.
A Projected revenue curtailment can be seen
This process can be replicated/ expanded to other disciplines, areas as well, after successful results in one offering
As the student numbers will grow so will the revenue curtailment every year.
X
XX
19. CASE STUDY – ROAD MAP
▪Build a team of 3
▪Assign someone from operation team to help us in getting the data
▪Data Collection– 250 students per state per stream for last 4 drives;
▪Capture as many input data points possible (mostly demographic)
▪Analysis
▪Select 5-6 most effective parameters;
▪Result
▪Example – An MBA student from rural Maharashtra with 5 years of exp and a monthly per capita income of 25000 will have a 25% of chances of dropping out on 3rd sem.
20. THANK YOU
To ride on the next wave of Quality Private Education in India and Abroad… ~ Paper Planes