Successfully reported this slideshow.
Your SlideShare is downloading. ×

KDD CUP 2015 - 9th solution

Ad

Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
...

Ad

Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
...

Ad

Key Point Summary
Latent Space Representation

— Clustering Model

— Skip-Gram Model
Backward Cumulative Features

— Gener...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 36 Ad
1 of 36 Ad
Advertisement

More Related Content

Advertisement
Advertisement

KDD CUP 2015 - 9th solution

  1. 1. Team: NCCU A Linear Ensemble of Classification Models with Novel Backward Cumulative Features for MOOC Dropout Prediction Chih-Ming Chen, Man-Kwan Shan,
 Ming-Feng Tsai, Yi-Hsuan Yang,
 Hsin-Ping Chen, Pei-Wen Yeh,
 and Sin-Ya Peng Research Center for
 Information Technology Innovation,
 Academia Sinica Department of Computer Science, National Chengchi University
  2. 2. Team: NCCU A Linear Ensemble of Classification Models with Novel Backward Cumulative Features for MOOC Dropout Prediction Chih-Ming Chen, Man-Kwan Shan,
 Ming-Feng Tsai, Yi-Hsuan Yang,
 Hsin-Ping Chen, Pei-Wen Yeh,
 and Sin-Ya Peng Research Center for
 Information Technology Innovation,
 Academia Sinica Department of Computer Science, National Chengchi University linearly combination of several models the proposed data engineering method it’s able to generate a bunch of distinct feature sets
  3. 3. Key Point Summary Latent Space Representation
 — Clustering Model
 — Skip-Gram Model Backward Cumulative Features
 — Generate 30 distinct sets of features Linear Model
 +
 Tree-based Model alleviate the feature sparsity problem alleviate the bias problem of statistical feature good match weakness when using sparse feature
  4. 4. Workflow Train Data (75%) Validate Data (25%) Train Data 0" 50000" 100000" 150000" 200000" 10/27/2013" 11/27/2013" 12/27/2013"1/27/2014"2/27/2014" 3/27/2014"4/27/2014" 5/27/2014"6/27/2014" 7/27/2014" Training'Date'Distribu.on 0" 20000" 40000" 60000" 80000" 100000" 120000" 140000" 10/27/2013" 11/27/2013" 12/27/2013"1/27/2014"2/27/2014" 3/27/2014"4/27/2014" 5/27/2014"6/27/2014" 7/27/2014" Tes.ng'Date'Distribu.on Split the training data based on
 the time distribution.
 stable results — 2 settings
  5. 5. Workflow Train Data (75%) Validate Data (25%) Learned
 Model Offline Evaluation Test Data Train Data Submission method 1 cross-validation — 2 settings check if it leads to better performance
  6. 6. Workflow Train Data (75%) Validate Data (25%) Learned
 Model Offline Evaluation Test Data Train Data Learned
 Model Submission method 2 — 2 settings check if it leads to better performance
  7. 7. Workflow Train Data (75%) Validate Data (25%) Learned
 Model Offline Evaluation Test Data Train Data Learned
 Model Submission — 2 settings
  8. 8. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Raw
 Data Support Vector
 Classifier Student Course Time A classical approach to a general prediction task. — 2 solutions Features
  9. 9. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation the feature engineering towards the MOOC dataset. — 2 solutions Features
  10. 10. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation — 2 solutions Features
  11. 11. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation solution 1 solution 2 xgboost scikit-learn — 2 solutions http://scikit-learn.org/stable/https://github.com/dmlc/xgboost
  12. 12. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation Feature Extraction / Feature Engineering — 2 solutions
  13. 13. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data
  14. 14. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data describing the status e.g.
 the month of the course
 the number of registration video 5 problem 10 wiki 0 discussion 2 navigation 0
  15. 15. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data dropout probability P( dropout|containing objects ) := P(O1|dropout) … P(Od|dropout) O = {O1, O2, …, Od}objects: e.g.
 the dropout ratio of the course estimate the probability from observed data
  16. 16. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data Latent Topic K-means Clustering on
 1. registered courses
 2. containing objects some features are sparse DeepWalk / Skip-Gram
 for obtaining a dense
 feature representation

  17. 17. DeepWalk https://github.com/phanein/deepwalk The Goal — Find the representation of each node of a graph. It’s an extension work of word2vec’s Skip-Gram model.
  18. 18. DeepWalk https://github.com/phanein/deepwalk The Goal — Find the representation of each node of a graph. It’s an extension work of word2vec’s Skip-Gram model. The core is to model the context information.
 (in practical, the node’s neighbours) Similar objects are mapped into similar space.
  19. 19. From DeepWalk to the MOOC Problem U1 U2 U3 Course A Course B Course C Course D Course E https://github.com/phanein/deepwalk
  20. 20. From DeepWalk to the MOOC Problem U1 U2 U3 Course A Course B Course C Course D U1 Course B Course E Course C U2 U1 Random Walk https://github.com/phanein/deepwalk Treat Random Walks on heterogeneous graph
 as the sentence.
  21. 21. From DeepWalk to the MOOC Problem U1 U2 U3 Course A Course B Course C Course D U1 Course B Course E Course C U2 U1 Random Walk https://github.com/phanein/deepwalk Treat Random Walks on heterogeneous graph
 as the sentence. U1 0.3 0.2 -0.1 0.5 -0.8 Course B 0.1 0.3 -0.5 1.2 -0.3
  22. 22. Performance Bag-of-words Bag-of-words Probability Bag-of-words Probability Naive Bayes Bag-of-words Probability Naive Bayes Latent Space > 0.890 > 0.901 > 0.902 > 0.903 Backward Cumulation Models Combination
  23. 23. Backward Cumulative Features — Motivation O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Logs Table 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  24. 24. Backward Cumulative Features — Motivation O X O X O O X O O X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X different period Logs Table different number of logs 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  25. 25. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=2 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  26. 26. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=3 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  27. 27. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=4 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  28. 28. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=5 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  29. 29. Backward Cumulative Features Raw
 Data Student Course Time Backward Cumulation . . . Feature Set 1 N=1 N=2 N=3 N=29 N=30 Feature Set 2 Feature Set 3 . . . Feature Set 29 Feature Set 30 — 2 strategies
  30. 30. Backward Cumulative Features Raw
 Data Student Course Time Backward Cumulation . . . Feature Set 1 N=1 N=2 N=3 N=29 N=30 Feature Set 2 Feature Set 3 . . . Feature Set 29 Feature Set 30 Classifier Strategy 1.
 Concatenate all features. — 2 strategies
  31. 31. Backward Cumulative Features Raw
 Data Student Course Time Backward Cumulation . . . Feature Set 1 N=1 N=2 N=3 N=29 N=30 Feature Set 2 Feature Set 3 . . . Feature Set 29 Feature Set 30 Classifier Classifier Classifier Classifier Classifier Strategy 2.
 Build 30 distinct models. Average — 2 strategies
  32. 32. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation solution 1 * 0.5 solution 2 * 0.5 xgboost scikit-learn
  33. 33. What We Learned from the Competition • Team Work is important
 — share ideas
 — share solutions • Model diversity & feature diversity
 — diverse models / features can capture different characteristic of the data • Realize the data
 — the goal
 — the evaluation metric
 — the data structure
  34. 34. What We Learned from the Competition • Team Work is important
 — share ideas
 — share solutions • Model diversity & feature diversity
 — diverse models / features can capture different characteristic of the data • Realize the data
 — the goal
 — the evaluation metric
 — the data structure Start earlier …
  35. 35. What We Learned from the Competition • Team Work is important
 — share ideas
 — share solutions • Model diversity & feature diversity
 — diverse models / features can capture different characteristic of the data • Realize the data
 — the goal
 — the evaluation metric
 — the data structure Start earlier … Feature Format Data Partition Feature Scale several things to be discussed e.g.
  36. 36. changecandy [at] gmail.com Any Question?

×