1. Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
Chih-Ming Chen, Man-Kwan Shan,
Ming-Feng Tsai, Yi-Hsuan Yang,
Hsin-Ping Chen, Pei-Wen Yeh,
and Sin-Ya Peng
Research Center for
Information Technology Innovation,
Academia Sinica
Department of Computer Science,
National Chengchi University
2. Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
Chih-Ming Chen, Man-Kwan Shan,
Ming-Feng Tsai, Yi-Hsuan Yang,
Hsin-Ping Chen, Pei-Wen Yeh,
and Sin-Ya Peng
Research Center for
Information Technology Innovation,
Academia Sinica
Department of Computer Science,
National Chengchi University
linearly combination of several models
the proposed data engineering method
it’s able to generate a bunch of distinct feature sets
3. Key Point Summary
Latent Space Representation
— Clustering Model
— Skip-Gram Model
Backward Cumulative Features
— Generate 30 distinct sets of features
Linear Model
+
Tree-based Model
alleviate the feature sparsity problem
alleviate the bias problem of statistical feature
good match
weakness when using sparse feature
4. Workflow
Train Data
(75%)
Validate Data
(25%)
Train Data
0"
50000"
100000"
150000"
200000"
10/27/2013"
11/27/2013"
12/27/2013"1/27/2014"2/27/2014"
3/27/2014"4/27/2014"
5/27/2014"6/27/2014"
7/27/2014"
Training'Date'Distribu.on
0"
20000"
40000"
60000"
80000"
100000"
120000"
140000"
10/27/2013"
11/27/2013"
12/27/2013"1/27/2014"2/27/2014"
3/27/2014"4/27/2014"
5/27/2014"6/27/2014"
7/27/2014"
Tes.ng'Date'Distribu.on
Split the training data based on
the time distribution.
stable results
— 2 settings
8. Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Raw
Data
Support Vector
Classifier
Student
Course
Time
A classical approach to a general prediction task.
— 2 solutions
Features
9. Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Raw
Data
Support Vector
Classifier
Student
Course
Time
Backward
Cumulation
the feature engineering towards the MOOC dataset.
— 2 solutions
Features
10. Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear
Combination
Final
Prediction
Raw
Data
Support Vector
Classifier
Student
Course
Time
Backward
Cumulation
— 2 solutions
Features
11. Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear
Combination
Final
Prediction
Raw
Data
Support Vector
Classifier
Student
Course
Time
Backward
Cumulation
solution 1
solution 2
xgboost
scikit-learn
— 2 solutions
http://scikit-learn.org/stable/https://github.com/dmlc/xgboost
12. Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear
Combination
Final
Prediction
Raw
Data
Support Vector
Classifier
Student
Course
Time
Backward
Cumulation
Feature Extraction / Feature Engineering
— 2 solutions
14. Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature
— Boolean (0/1)
— Term Frequency (TF)
• Probability Value
— Ratio
— Naive Bayes
• Latent Space Feature
— Clustering
— DeepWalk
Raw
Data
describing the status
e.g.
the month of the course
the number of registration
video 5
problem 10
wiki 0
discussion 2
navigation 0
15. Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature
— Boolean (0/1)
— Term Frequency (TF)
• Probability Value
— Ratio
— Naive Bayes
• Latent Space Feature
— Clustering
— DeepWalk
Raw
Data
dropout probability
P( dropout|containing objects )
:= P(O1|dropout) … P(Od|dropout)
O = {O1, O2, …, Od}objects:
e.g.
the dropout ratio of the course
estimate the probability from observed data
16. Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature
— Boolean (0/1)
— Term Frequency (TF)
• Probability Value
— Ratio
— Naive Bayes
• Latent Space Feature
— Clustering
— DeepWalk
Raw
Data
Latent Topic
K-means Clustering on
1. registered courses
2. containing objects
some features are sparse
DeepWalk / Skip-Gram
for obtaining a dense
feature representation
18. DeepWalk
https://github.com/phanein/deepwalk
The Goal — Find the representation of each node of a graph.
It’s an extension work of word2vec’s Skip-Gram model.
The core is to model the context information.
(in practical, the node’s neighbours)
Similar objects are mapped into similar space.
19. From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
Course E
https://github.com/phanein/deepwalk
20. From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
U1
Course B
Course E
Course C
U2 U1
Random Walk
https://github.com/phanein/deepwalk
Treat Random Walks on heterogeneous graph
as the sentence.
21. From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
U1
Course B
Course E
Course C
U2 U1
Random Walk
https://github.com/phanein/deepwalk
Treat Random Walks on heterogeneous graph
as the sentence.
U1
0.3 0.2 -0.1 0.5 -0.8 Course B 0.1 0.3 -0.5 1.2 -0.3
23. Backward Cumulative Features — Motivation
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Logs Table
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
24. Backward Cumulative Features — Motivation
O X O X O O X O O X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
different period
Logs Table
different number of logs
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
25. Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=2
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
26. Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=3
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
27. Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=4
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
28. Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=5
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
32. Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear
Combination
Final
Prediction
Raw
Data
Support Vector
Classifier
Student
Course
Time
Backward
Cumulation
solution 1 * 0.5
solution 2 * 0.5
xgboost
scikit-learn
33. What We Learned from the Competition
• Team Work is important
— share ideas
— share solutions
• Model diversity & feature diversity
— diverse models / features can capture different characteristic of the data
• Realize the data
— the goal
— the evaluation metric
— the data structure
34. What We Learned from the Competition
• Team Work is important
— share ideas
— share solutions
• Model diversity & feature diversity
— diverse models / features can capture different characteristic of the data
• Realize the data
— the goal
— the evaluation metric
— the data structure
Start earlier …
35. What We Learned from the Competition
• Team Work is important
— share ideas
— share solutions
• Model diversity & feature diversity
— diverse models / features can capture different characteristic of the data
• Realize the data
— the goal
— the evaluation metric
— the data structure
Start earlier …
Feature Format
Data Partition
Feature Scale
several things to be discussed
e.g.