Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
...
Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
...
Key Point Summary
Latent Space Representation

— Clustering Model

— Skip-Gram Model
Backward Cumulative Features

— Gener...
Workflow
Train Data
(75%)
Validate Data
(25%)
Train Data
0"
50000"
100000"
150000"
200000"
10/27/2013"
11/27/2013"
12/27/2...
Workflow
Train Data
(75%)
Validate Data
(25%)
Learned

Model Offline
Evaluation
Test Data
Train Data
Submission
method 1
cr...
Workflow
Train Data
(75%)
Validate Data
(25%)
Learned

Model Offline
Evaluation
Test Data
Train Data
Learned

Model
Submiss...
Workflow
Train Data
(75%)
Validate Data
(25%)
Learned

Model Offline
Evaluation
Test Data
Train Data
Learned

Model
Submiss...
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Raw

Data
Support Vector

Classifier
Student
Cour...
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Raw

Data
Suppo...
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear

Combina...
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear

Combina...
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear

Combina...
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature

— Boolean (0/1)

— Term Frequency (TF)
• Proba...
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature

— Boolean (0/1)

— Term Frequency (TF)
• Proba...
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature

— Boolean (0/1)

— Term Frequency (TF)
• Proba...
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature

— Boolean (0/1)

— Term Frequency (TF)
• Proba...
DeepWalk
https://github.com/phanein/deepwalk
The Goal — Find the representation of each node of a graph.
It’s an extension...
DeepWalk
https://github.com/phanein/deepwalk
The Goal — Find the representation of each node of a graph.
It’s an extension...
From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
Course E
https://github.com/phanein/deepwalk
From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
U1
Course B
Course E
Course C
U2 U1
Random ...
From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
U1
Course B
Course E
Course C
U2 U1
Random ...
Performance
Bag-of-words Bag-of-words
Probability
Bag-of-words
Probability
Naive Bayes
Bag-of-words
Probability
Naive Baye...
Backward Cumulative Features — Motivation
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X ...
Backward Cumulative Features — Motivation
O X O X O O X O O X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X ...
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Con...
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Con...
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Con...
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Con...
Backward Cumulative Features
Raw

Data
Student
Course
Time
Backward
Cumulation
.
.
.
Feature Set 1
N=1
N=2
N=3
N=29
N=30
F...
Backward Cumulative Features
Raw

Data
Student
Course
Time
Backward
Cumulation
.
.
.
Feature Set 1
N=1
N=2
N=3
N=29
N=30
F...
Backward Cumulative Features
Raw

Data
Student
Course
Time
Backward
Cumulation
.
.
.
Feature Set 1
N=1
N=2
N=3
N=29
N=30
F...
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear

Combina...
What We Learned from the Competition
• Team Work is important

— share ideas

— share solutions
• Model diversity & featur...
What We Learned from the Competition
• Team Work is important

— share ideas

— share solutions
• Model diversity & featur...
What We Learned from the Competition
• Team Work is important

— share ideas

— share solutions
• Model diversity & featur...
changecandy [at] gmail.com
Any Question?
Upcoming SlideShare
Loading in …5
×

KDD CUP 2015 - 9th solution

675 views

Published on

KDD CUP 2015 - 9th solution

Published in: Technology
  • Be the first to comment

  • Be the first to like this

KDD CUP 2015 - 9th solution

  1. 1. Team: NCCU A Linear Ensemble of Classification Models with Novel Backward Cumulative Features for MOOC Dropout Prediction Chih-Ming Chen, Man-Kwan Shan,
 Ming-Feng Tsai, Yi-Hsuan Yang,
 Hsin-Ping Chen, Pei-Wen Yeh,
 and Sin-Ya Peng Research Center for
 Information Technology Innovation,
 Academia Sinica Department of Computer Science, National Chengchi University
  2. 2. Team: NCCU A Linear Ensemble of Classification Models with Novel Backward Cumulative Features for MOOC Dropout Prediction Chih-Ming Chen, Man-Kwan Shan,
 Ming-Feng Tsai, Yi-Hsuan Yang,
 Hsin-Ping Chen, Pei-Wen Yeh,
 and Sin-Ya Peng Research Center for
 Information Technology Innovation,
 Academia Sinica Department of Computer Science, National Chengchi University linearly combination of several models the proposed data engineering method it’s able to generate a bunch of distinct feature sets
  3. 3. Key Point Summary Latent Space Representation
 — Clustering Model
 — Skip-Gram Model Backward Cumulative Features
 — Generate 30 distinct sets of features Linear Model
 +
 Tree-based Model alleviate the feature sparsity problem alleviate the bias problem of statistical feature good match weakness when using sparse feature
  4. 4. Workflow Train Data (75%) Validate Data (25%) Train Data 0" 50000" 100000" 150000" 200000" 10/27/2013" 11/27/2013" 12/27/2013"1/27/2014"2/27/2014" 3/27/2014"4/27/2014" 5/27/2014"6/27/2014" 7/27/2014" Training'Date'Distribu.on 0" 20000" 40000" 60000" 80000" 100000" 120000" 140000" 10/27/2013" 11/27/2013" 12/27/2013"1/27/2014"2/27/2014" 3/27/2014"4/27/2014" 5/27/2014"6/27/2014" 7/27/2014" Tes.ng'Date'Distribu.on Split the training data based on
 the time distribution.
 stable results — 2 settings
  5. 5. Workflow Train Data (75%) Validate Data (25%) Learned
 Model Offline Evaluation Test Data Train Data Submission method 1 cross-validation — 2 settings check if it leads to better performance
  6. 6. Workflow Train Data (75%) Validate Data (25%) Learned
 Model Offline Evaluation Test Data Train Data Learned
 Model Submission method 2 — 2 settings check if it leads to better performance
  7. 7. Workflow Train Data (75%) Validate Data (25%) Learned
 Model Offline Evaluation Test Data Train Data Learned
 Model Submission — 2 settings
  8. 8. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Raw
 Data Support Vector
 Classifier Student Course Time A classical approach to a general prediction task. — 2 solutions Features
  9. 9. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation the feature engineering towards the MOOC dataset. — 2 solutions Features
  10. 10. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation — 2 solutions Features
  11. 11. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation solution 1 solution 2 xgboost scikit-learn — 2 solutions http://scikit-learn.org/stable/https://github.com/dmlc/xgboost
  12. 12. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation Feature Extraction / Feature Engineering — 2 solutions
  13. 13. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data
  14. 14. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data describing the status e.g.
 the month of the course
 the number of registration video 5 problem 10 wiki 0 discussion 2 navigation 0
  15. 15. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data dropout probability P( dropout|containing objects ) := P(O1|dropout) … P(Od|dropout) O = {O1, O2, …, Od}objects: e.g.
 the dropout ratio of the course estimate the probability from observed data
  16. 16. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data Latent Topic K-means Clustering on
 1. registered courses
 2. containing objects some features are sparse DeepWalk / Skip-Gram
 for obtaining a dense
 feature representation

  17. 17. DeepWalk https://github.com/phanein/deepwalk The Goal — Find the representation of each node of a graph. It’s an extension work of word2vec’s Skip-Gram model.
  18. 18. DeepWalk https://github.com/phanein/deepwalk The Goal — Find the representation of each node of a graph. It’s an extension work of word2vec’s Skip-Gram model. The core is to model the context information.
 (in practical, the node’s neighbours) Similar objects are mapped into similar space.
  19. 19. From DeepWalk to the MOOC Problem U1 U2 U3 Course A Course B Course C Course D Course E https://github.com/phanein/deepwalk
  20. 20. From DeepWalk to the MOOC Problem U1 U2 U3 Course A Course B Course C Course D U1 Course B Course E Course C U2 U1 Random Walk https://github.com/phanein/deepwalk Treat Random Walks on heterogeneous graph
 as the sentence.
  21. 21. From DeepWalk to the MOOC Problem U1 U2 U3 Course A Course B Course C Course D U1 Course B Course E Course C U2 U1 Random Walk https://github.com/phanein/deepwalk Treat Random Walks on heterogeneous graph
 as the sentence. U1 0.3 0.2 -0.1 0.5 -0.8 Course B 0.1 0.3 -0.5 1.2 -0.3
  22. 22. Performance Bag-of-words Bag-of-words Probability Bag-of-words Probability Naive Bayes Bag-of-words Probability Naive Bayes Latent Space > 0.890 > 0.901 > 0.902 > 0.903 Backward Cumulation Models Combination
  23. 23. Backward Cumulative Features — Motivation O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Logs Table 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  24. 24. Backward Cumulative Features — Motivation O X O X O O X O O X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X different period Logs Table different number of logs 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  25. 25. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=2 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  26. 26. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=3 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  27. 27. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=4 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  28. 28. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=5 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  29. 29. Backward Cumulative Features Raw
 Data Student Course Time Backward Cumulation . . . Feature Set 1 N=1 N=2 N=3 N=29 N=30 Feature Set 2 Feature Set 3 . . . Feature Set 29 Feature Set 30 — 2 strategies
  30. 30. Backward Cumulative Features Raw
 Data Student Course Time Backward Cumulation . . . Feature Set 1 N=1 N=2 N=3 N=29 N=30 Feature Set 2 Feature Set 3 . . . Feature Set 29 Feature Set 30 Classifier Strategy 1.
 Concatenate all features. — 2 strategies
  31. 31. Backward Cumulative Features Raw
 Data Student Course Time Backward Cumulation . . . Feature Set 1 N=1 N=2 N=3 N=29 N=30 Feature Set 2 Feature Set 3 . . . Feature Set 29 Feature Set 30 Classifier Classifier Classifier Classifier Classifier Strategy 2.
 Build 30 distinct models. Average — 2 strategies
  32. 32. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation solution 1 * 0.5 solution 2 * 0.5 xgboost scikit-learn
  33. 33. What We Learned from the Competition • Team Work is important
 — share ideas
 — share solutions • Model diversity & feature diversity
 — diverse models / features can capture different characteristic of the data • Realize the data
 — the goal
 — the evaluation metric
 — the data structure
  34. 34. What We Learned from the Competition • Team Work is important
 — share ideas
 — share solutions • Model diversity & feature diversity
 — diverse models / features can capture different characteristic of the data • Realize the data
 — the goal
 — the evaluation metric
 — the data structure Start earlier …
  35. 35. What We Learned from the Competition • Team Work is important
 — share ideas
 — share solutions • Model diversity & feature diversity
 — diverse models / features can capture different characteristic of the data • Realize the data
 — the goal
 — the evaluation metric
 — the data structure Start earlier … Feature Format Data Partition Feature Scale several things to be discussed e.g.
  36. 36. changecandy [at] gmail.com Any Question?

×