Farshid_Defense

A Learning Objective-Based Model
to Predict Students’ Success
in a First-Year Engineering Course
Farshid Marbouti
Committee: Dr. Diefes-Dux, Dr. Madhavan
Dr. Main, Dr. Ohland
January 2016

MethodsIntroduction
Retention & Graduation Rate
3
Model Development Conclusions

Solution
Analyze students’ performance data
(Huang & Fang, 2012; White, 2012)
Predict students’ success in a course and identify
at-risk students
(Jin, Imbrie, Lin, & Chen, 2011; Olani, 2009)
Use the prediction model as an early warning system
Inform both the instructor and the students of their
performance
(Arnold & Pistilli, 2012b; Essa & Hanan, 2012; Macfadyen & Dawson, 2010)
4
MethodsIntroduction Model Development Conclusions

Shortcomings of Early Warning Systems
Use same model for all courses
Generic model decreases accuracy
Not useful for non-traditional courses
Do not reap the benefits of standards-based grading
More data points
Higher reliability in grading
Depend on online access data
May not be suitable for face-to-face courses
Some courses may not use schools’ CMS
Instructors do not have access to these data
5

Research Questions (1/3)
6
Of six different predictive modeling methods, as
well as a seventh hybrid or Ensemble method,
which is the most successful at identifying at-risk
students, based on specified in-semester student
performance data? Why is this method the most
successful? Why are the other methods less
successful?

7
To what extent can the models created by
predictive methods for identifying at-risk students
in a course be improved through the selection of
in-semester student performance data (e.g., quiz,
homework learning objectives, midterm exam)?
What does the selection reveal?

8
What are the relationships, if any, between
students’ success and achievement of different
learning objectives in a course? What are the
implications for the resulting prediction models
and what are the pedagogical implications?

Settings & Data
10
50% training
25% Verify1
25% Verify2
Final Test
ENGR 132

Prediction Modeling Methods
11
Input
Students’ learning objective scores (HW 1-5, 33 LOs)
Students’ grades on course assessments
Quiz: Weeks 1-5, 10 grades
Written Exam 1 (Midterm Exam): 1 grade
Output
At-risk (positive)
Successful (negative)

Evaluating the Models
12

Research Flowchart
13
Spring 2014
Spring 2013
Data
Modeling Methods:
Logistics Regression
Multi Layer Perceptron
Support Vector Machine
K-Nearest Neighbor
Decision Tree
Naive Bayes Classiﬁer
- Train and verify the 6
models
- Error Analysis
- Create an ensemble model
Model Development
- Train and verify top 2
models
- Use different number
of variables
Feature Selection
- Test top 2 models
- Select optimal
number of variables
Final Test
- Randomly cluster training
data
- Train top 2 models with
different number of clusters
- Verify the models
Model Robustness
subsets of data
- Verify the models
Assessment Types
Feature Selection
Methods:
Correlations
Explained Variance
Gini Gain
1 2
3
Top
variables
Top 2 modeling
methods

Training and Verifying the Models
15

Ensemble Model
16
KNN, SVM, NBC

R.Q. 1 – Best Prediction Model(s)
NBC & Ensemble
Low % of students fail -> small sample size
Bias: Inaccurate assumptions in the learning algorithm
Variance: Sensitivity of the model to small changes
Models with high bias/low variance perform better
17

Predictive Power of Assessments
18

Feature Selection - Final Test
19

R.Q. 2 – Data Selection
Models with only two variables
Simple models have high bias/low variance
20

Correlations with Success
21

R.Q. 3 – Learning Objectives
Identify potential threshold learning objectives
Week 5 was important for students’ success
Topic: User defined functions in Matlab
The topic is important for the rest of the semester
The first difficult topic that can differentiate students
Students start to take the course seriously from week 5
22

Recommendations
Minimum class size: 120 students with ~10% at-risk students
All models have error: Communicate results with this consideration
Use at least two semesters data to train and test the models
No drastic change in course structure from one semester to another
Be mindful about false negative (type II) error
The process can reveal relations about assessments and success
24
% of at-risk students
low high
# of students
in the course
low Know the students
high SVM, NBC KNN, DT, MLP

Limitations
Models Errors
Pedagogical decisions (e.g. only HW was graded by SBG)
Quality of performance data
Mid-size classes (40-120)
25

Future Work
How to use the models?
Predict students performance during the semester
Investigate what leads students to success in a course
26

Thank You…
 My Advisers Dr. Diefes-Dux & Dr. Madhavan
 My committee members: Dr. Ohland and Dr. Main
 My wife
 Friends who were part of my journey

Research
Flowchart
29
MethodsIntroduction Model Development ConclusionsSpring 2014
Spring 2013
Data Cleaning
Randomly divide
- Sp 2013 data into 50% train
data, 25% / 25% verify
- Sp 2014 (test)
Train/Verify/Test Datasets
Modeling
Methods:
Log Reg
MLP
SVM
KNN
DT
NBC
- Train 6 models
Train
- Select variables
Feature
Selection
- Verify the 6 models
- Use verify1 data
Verify Models
- Compare the models
- Analyze the errors
Error Analysis
- Train/verify ensemble
model
- Select top 2 of 7
models
Create Ensemble
Model
- Train top 2 models
- Use different number
of variables
Train
- Test top 2 models
- Use test dataset
Test
different number of clusters
- Verify the models
Model Robustness
subsets of data
- Verify the models
Assessment Types
- Randomly cluster
training data
Cluster Data
- Verify top 2 models
- Use verify2 data
Verify Models
- Select optimal
number of predictors
Variable selection
Feature Selection
Methods:
Correlations
Explained Variance
Gini gain
1
2
3
Top 2 modeling
methods
Top
variables
Top 2 modeling
methods

Feature Selection
33

Farshid_Defense

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Farshid_Defense

Similar to Farshid_Defense (20)

Farshid_Defense