KDD CUP 2015 - 9th solution

志明 陳
Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
Chih-Ming Chen, Man-Kwan Shan,

Ming-Feng Tsai, Yi-Hsuan Yang,

Hsin-Ping Chen, Pei-Wen Yeh,

and Sin-Ya Peng
Research Center for

Information Technology Innovation,

Academia Sinica
Department of Computer Science,
National Chengchi University
Team: NCCU
A Linear Ensemble of Classification Models
with Novel Backward Cumulative Features
for MOOC Dropout Prediction
Chih-Ming Chen, Man-Kwan Shan,

Ming-Feng Tsai, Yi-Hsuan Yang,

Hsin-Ping Chen, Pei-Wen Yeh,

and Sin-Ya Peng
Research Center for

Information Technology Innovation,

Academia Sinica
Department of Computer Science,
National Chengchi University
linearly combination of several models
the proposed data engineering method
it’s able to generate a bunch of distinct feature sets
Key Point Summary
Latent Space Representation

— Clustering Model

— Skip-Gram Model
Backward Cumulative Features

— Generate 30 distinct sets of features
Linear Model

+

Tree-based Model
alleviate the feature sparsity problem
alleviate the bias problem of statistical feature
good match
weakness when using sparse feature
Workflow
Train Data
(75%)
Validate Data
(25%)
Train Data
0"
50000"
100000"
150000"
200000"
10/27/2013"
11/27/2013"
12/27/2013"1/27/2014"2/27/2014"
3/27/2014"4/27/2014"
5/27/2014"6/27/2014"
7/27/2014"
Training'Date'Distribu.on
0"
20000"
40000"
60000"
80000"
100000"
120000"
140000"
10/27/2013"
11/27/2013"
12/27/2013"1/27/2014"2/27/2014"
3/27/2014"4/27/2014"
5/27/2014"6/27/2014"
7/27/2014"
Tes.ng'Date'Distribu.on
Split the training data based on

the time distribution.

stable results
— 2 settings
Workflow
Train Data
(75%)
Validate Data
(25%)
Learned

Model Offline
Evaluation
Test Data
Train Data
Submission
method 1
cross-validation
— 2 settings
check if it leads to better performance
Workflow
Train Data
(75%)
Validate Data
(25%)
Learned

Model Offline
Evaluation
Test Data
Train Data
Learned

Model
Submission
method 2
— 2 settings
check if it leads to better performance
Workflow
Train Data
(75%)
Validate Data
(25%)
Learned

Model Offline
Evaluation
Test Data
Train Data
Learned

Model
Submission
— 2 settings
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Raw

Data
Support Vector

Classifier
Student
Course
Time
A classical approach to a general prediction task.
— 2 solutions
Features
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Raw

Data
Support Vector

Classifier
Student
Course
Time
Backward
Cumulation
the feature engineering towards the MOOC dataset.
— 2 solutions
Features
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear

Combination
Final

Prediction
Raw

Data
Support Vector

Classifier
Student
Course
Time
Backward
Cumulation
— 2 solutions
Features
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear

Combination
Final

Prediction
Raw

Data
Support Vector

Classifier
Student
Course
Time
Backward
Cumulation
solution 1
solution 2
xgboost
scikit-learn
— 2 solutions
http://scikit-learn.org/stable/https://github.com/dmlc/xgboost
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear

Combination
Final

Prediction
Raw

Data
Support Vector

Classifier
Student
Course
Time
Backward
Cumulation
Feature Extraction / Feature Engineering
— 2 solutions
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature

— Boolean (0/1)

— Term Frequency (TF)
• Probability Value

— Ratio

— Naive Bayes
• Latent Space Feature

— Clustering

— DeepWalk
Raw

Data
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature

— Boolean (0/1)

— Term Frequency (TF)
• Probability Value

— Ratio

— Naive Bayes
• Latent Space Feature

— Clustering

— DeepWalk
Raw

Data
describing the status
e.g.

the month of the course

the number of registration
video 5
problem 10
wiki 0
discussion 2
navigation 0
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature

— Boolean (0/1)

— Term Frequency (TF)
• Probability Value

— Ratio

— Naive Bayes
• Latent Space Feature

— Clustering

— DeepWalk
Raw

Data
dropout probability
P( dropout|containing objects )
:= P(O1|dropout) … P(Od|dropout)
O = {O1, O2, …, Od}objects:
e.g.

the dropout ratio of the course
estimate the probability from observed data
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature

— Boolean (0/1)

— Term Frequency (TF)
• Probability Value

— Ratio

— Naive Bayes
• Latent Space Feature

— Clustering

— DeepWalk
Raw

Data
Latent Topic
K-means Clustering on

1. registered courses

2. containing objects
some features are sparse
DeepWalk / Skip-Gram

for obtaining a dense

feature representation

DeepWalk
https://github.com/phanein/deepwalk
The Goal — Find the representation of each node of a graph.
It’s an extension work of word2vec’s Skip-Gram model.
DeepWalk
https://github.com/phanein/deepwalk
The Goal — Find the representation of each node of a graph.
It’s an extension work of word2vec’s Skip-Gram model.
The core is to model the context information.

(in practical, the node’s neighbours)
Similar objects are mapped into similar space.
From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
Course E
https://github.com/phanein/deepwalk
From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
U1
Course B
Course E
Course C
U2 U1
Random Walk
https://github.com/phanein/deepwalk
Treat Random Walks on heterogeneous graph

as the sentence.
From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
U1
Course B
Course E
Course C
U2 U1
Random Walk
https://github.com/phanein/deepwalk
Treat Random Walks on heterogeneous graph

as the sentence.
U1
0.3 0.2 -0.1 0.5 -0.8 Course B 0.1 0.3 -0.5 1.2 -0.3
Performance
Bag-of-words Bag-of-words
Probability
Bag-of-words
Probability
Naive Bayes
Bag-of-words
Probability
Naive Bayes
Latent Space
> 0.890 > 0.901 > 0.902 > 0.903
Backward
Cumulation
Models
Combination
Backward Cumulative Features — Motivation
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Logs Table
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features — Motivation
O X O X O O X O O X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
different period
Logs Table
different number of logs
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=2
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=3
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=4
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=5
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features
Raw

Data
Student
Course
Time
Backward
Cumulation
.
.
.
Feature Set 1
N=1
N=2
N=3
N=29
N=30
Feature Set 2
Feature Set 3
.
.
.
Feature Set 29
Feature Set 30
— 2 strategies
Backward Cumulative Features
Raw

Data
Student
Course
Time
Backward
Cumulation
.
.
.
Feature Set 1
N=1
N=2
N=3
N=29
N=30
Feature Set 2
Feature Set 3
.
.
.
Feature Set 29
Feature Set 30
Classifier
Strategy 1.

Concatenate all features.
— 2 strategies
Backward Cumulative Features
Raw

Data
Student
Course
Time
Backward
Cumulation
.
.
.
Feature Set 1
N=1
N=2
N=3
N=29
N=30
Feature Set 2
Feature Set 3
.
.
.
Feature Set 29
Feature Set 30
Classifier
Classifier
Classifier
Classifier
Classifier
Strategy 2.

Build 30 distinct models.
Average
— 2 strategies
Prediction Model Overview
Logistic Regression
Gradient Boosting
Classifier
Gradient Boosting
Decision Trees
Linear

Combination
Final

Prediction
Raw

Data
Support Vector

Classifier
Student
Course
Time
Backward
Cumulation
solution 1 * 0.5
solution 2 * 0.5
xgboost
scikit-learn
What We Learned from the Competition
• Team Work is important

— share ideas

— share solutions
• Model diversity & feature diversity

— diverse models / features can capture different characteristic of the data
• Realize the data

— the goal

— the evaluation metric

— the data structure
What We Learned from the Competition
• Team Work is important

— share ideas

— share solutions
• Model diversity & feature diversity

— diverse models / features can capture different characteristic of the data
• Realize the data

— the goal

— the evaluation metric

— the data structure
Start earlier …
What We Learned from the Competition
• Team Work is important

— share ideas

— share solutions
• Model diversity & feature diversity

— diverse models / features can capture different characteristic of the data
• Realize the data

— the goal

— the evaluation metric

— the data structure
Start earlier …
Feature Format
Data Partition
Feature Scale
several things to be discussed
e.g.
changecandy [at] gmail.com
Any Question?
1 of 36

Recommended

Markov chain monte_carlo_methods_for_machine_learning by
Markov chain monte_carlo_methods_for_machine_learningMarkov chain monte_carlo_methods_for_machine_learning
Markov chain monte_carlo_methods_for_machine_learningAndres Mendez-Vazquez
679 views46 slides
SASA 2016 by
SASA 2016SASA 2016
SASA 2016Mzabalazo Ngwenya
107 views17 slides
26 Machine Learning Unsupervised Fuzzy C-Means by
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-MeansAndres Mendez-Vazquez
2K views88 slides
Learning to discover monte carlo algorithm on spin ice manifold by
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
267 views79 slides
QMC: Transition Workshop - Approximating Multivariate Functions When Function... by
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...The Statistical and Applied Mathematical Sciences Institute
89 views36 slides
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I... by
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...The Statistical and Applied Mathematical Sciences Institute
171 views37 slides

More Related Content

What's hot

[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl... by
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...npinto
2.8K views60 slides
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie... by
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...The Statistical and Applied Mathematical Sciences Institute
482 views26 slides
Loss Calibrated Variational Inference by
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational InferenceTomasz Kusmierczyk
7.3K views39 slides
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie... by
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...The Statistical and Applied Mathematical Sciences Institute
705 views133 slides
H2O World - Generalized Low Rank Models - Madeleine Udell by
H2O World - Generalized Low Rank Models - Madeleine UdellH2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine UdellSri Ambati
2K views30 slides
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie... by
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...The Statistical and Applied Mathematical Sciences Institute
401 views38 slides

What's hot(20)

[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl... by npinto
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
npinto2.8K views
H2O World - Generalized Low Rank Models - Madeleine Udell by Sri Ambati
H2O World - Generalized Low Rank Models - Madeleine UdellH2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine Udell
Sri Ambati2K views
Automatic variational inference with latent categorical variables by Tomasz Kusmierczyk
Automatic variational inference with latent categorical variablesAutomatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variables
Tomasz Kusmierczyk4.7K views
Towards typesafe deep learning in scala by Tongfei Chen
Towards typesafe deep learning in scalaTowards typesafe deep learning in scala
Towards typesafe deep learning in scala
Tongfei Chen474 views
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd by Sri Ambati
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
Sri Ambati4.6K views
Dynamic Feature Induction: The Last Gist to the State-of-the-Art by Jinho Choi
Dynamic Feature Induction: The Last Gist to the State-of-the-ArtDynamic Feature Induction: The Last Gist to the State-of-the-Art
Dynamic Feature Induction: The Last Gist to the State-of-the-Art
Jinho Choi90 views
Monte Carlo Tree Search in 2014 (MCMC days in Marseille) by Olivier Teytaud
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Olivier Teytaud224 views
A nonlinear approximation of the Bayesian Update formula by Alexander Litvinenko
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
Actors for Behavioural Simulation by ClarkTony
Actors for Behavioural SimulationActors for Behavioural Simulation
Actors for Behavioural Simulation
ClarkTony223 views
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization by jfrchicanog
Efficient Hill Climber for Multi-Objective Pseudo-Boolean OptimizationEfficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
jfrchicanog195 views
Hyperparameter optimization with approximate gradient by Fabian Pedregosa
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
Fabian Pedregosa13.6K views
11 Machine Learning Important Issues in Machine Learning by Andres Mendez-Vazquez
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning

Similar to KDD CUP 2015 - 9th solution

Deep Learning for Computer Vision: Attention Models (UPC 2016) by
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Universitat Politècnica de Catalunya
7.3K views31 slides
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech... by
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
974 views37 slides
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell... by
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
1.8K views29 slides
Developing Computational Skills in the Sciences with Matlab Webinar 2017 by
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017SERC at Carleton College
345 views51 slides
Lecture 06 marco aurelio ranzato - deep learning by
Lecture 06   marco aurelio ranzato - deep learningLecture 06   marco aurelio ranzato - deep learning
Lecture 06 marco aurelio ranzato - deep learningmustafa sarac
3.8K views234 slides
Accelerating Random Forests in Scikit-Learn by
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnGilles Louppe
19.5K views31 slides

Similar to KDD CUP 2015 - 9th solution(20)

Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech... by MLconf
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf974 views
Developing Computational Skills in the Sciences with Matlab Webinar 2017 by SERC at Carleton College
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Lecture 06 marco aurelio ranzato - deep learning by mustafa sarac
Lecture 06   marco aurelio ranzato - deep learningLecture 06   marco aurelio ranzato - deep learning
Lecture 06 marco aurelio ranzato - deep learning
mustafa sarac3.8K views
Accelerating Random Forests in Scikit-Learn by Gilles Louppe
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Gilles Louppe19.5K views
Surveillance scene classification using machine learning by Utkarsh Contractor
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learning
Utkarsh Contractor223 views
Learning to Rank with Neural Networks by Bhaskar Mitra
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
Bhaskar Mitra883 views
Multimodal Residual Networks for Visual QA by Jin-Hwa Kim
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
Jin-Hwa Kim261 views
Matrix Factorizations for Recommender Systems by Dmitriy Selivanov
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender Systems
Dmitriy Selivanov3.5K views
Ai and ml study group lecture 1 and 2 by Ashley Davis
Ai and ml study group   lecture 1 and 2Ai and ml study group   lecture 1 and 2
Ai and ml study group lecture 1 and 2
Ashley Davis83 views
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea... by Wee Hyong Tok
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
Wee Hyong Tok509 views
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine by Soma Boubou
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Soma Boubou127 views
Learning with classification and clustering, neural networks by Shaun D'Souza
Learning with classification and clustering, neural networksLearning with classification and clustering, neural networks
Learning with classification and clustering, neural networks
Shaun D'Souza71 views
TensorFlow and Deep Learning Tips and Tricks by Ben Ball
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
Ben Ball1.9K views

More from 志明 陳

RecSys'19 SMORe by
RecSys'19 SMOReRecSys'19 SMORe
RecSys'19 SMORe志明 陳
759 views152 slides
Oop by
OopOop
Oop志明 陳
683 views38 slides
ML Toolkit Share by
ML Toolkit ShareML Toolkit Share
ML Toolkit Share志明 陳
97 views40 slides
CM NCCU Class2 by
CM NCCU Class2CM NCCU Class2
CM NCCU Class2志明 陳
390 views93 slides
CM NCCU Class1 by
CM NCCU Class1CM NCCU Class1
CM NCCU Class1志明 陳
251 views87 slides
CM UTaipei Kaggle Share by
CM UTaipei Kaggle ShareCM UTaipei Kaggle Share
CM UTaipei Kaggle Share志明 陳
370 views72 slides

More from 志明 陳(10)

RecSys'19 SMORe by 志明 陳
RecSys'19 SMOReRecSys'19 SMORe
RecSys'19 SMORe
志明 陳759 views
ML Toolkit Share by 志明 陳
ML Toolkit ShareML Toolkit Share
ML Toolkit Share
志明 陳97 views
CM NCCU Class2 by 志明 陳
CM NCCU Class2CM NCCU Class2
CM NCCU Class2
志明 陳390 views
CM NCCU Class1 by 志明 陳
CM NCCU Class1CM NCCU Class1
CM NCCU Class1
志明 陳251 views
CM UTaipei Kaggle Share by 志明 陳
CM UTaipei Kaggle ShareCM UTaipei Kaggle Share
CM UTaipei Kaggle Share
志明 陳370 views
MLDM CM Kaggle Tips by 志明 陳
MLDM CM Kaggle TipsMLDM CM Kaggle Tips
MLDM CM Kaggle Tips
志明 陳508 views
CM KaggleTW Share by 志明 陳
CM KaggleTW ShareCM KaggleTW Share
CM KaggleTW Share
志明 陳2.2K views
KAMERA 2nd solution by 志明 陳
KAMERA 2nd solutionKAMERA 2nd solution
KAMERA 2nd solution
志明 陳1.2K views
PIXNET Page Views 1st solution by 志明 陳
PIXNET Page Views 1st solutionPIXNET Page Views 1st solution
PIXNET Page Views 1st solution
志明 陳200 views

Recently uploaded

DRBD Deep Dive - Philipp Reisner - LINBIT by
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
110 views21 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
373 views86 slides
Uni Systems for Power Platform.pptx by
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptxUni Systems S.M.S.A.
60 views21 slides
Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
54 views27 slides
State of the Union - Rohit Yadav - Apache CloudStack by
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStackShapeBlue
218 views53 slides
NTGapps NTG LowCode Platform by
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform Mustafa Kuğu
287 views30 slides

Recently uploaded(20)

DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue110 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software373 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty54 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue218 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu287 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue147 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue81 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue93 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue154 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue75 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue138 views
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool by ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue56 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue58 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue149 views
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates by ShapeBlue
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesKeynote Talk: Open Source is Not Dead - Charles Schulz - Vates
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates
ShapeBlue178 views

KDD CUP 2015 - 9th solution

  • 1. Team: NCCU A Linear Ensemble of Classification Models with Novel Backward Cumulative Features for MOOC Dropout Prediction Chih-Ming Chen, Man-Kwan Shan,
 Ming-Feng Tsai, Yi-Hsuan Yang,
 Hsin-Ping Chen, Pei-Wen Yeh,
 and Sin-Ya Peng Research Center for
 Information Technology Innovation,
 Academia Sinica Department of Computer Science, National Chengchi University
  • 2. Team: NCCU A Linear Ensemble of Classification Models with Novel Backward Cumulative Features for MOOC Dropout Prediction Chih-Ming Chen, Man-Kwan Shan,
 Ming-Feng Tsai, Yi-Hsuan Yang,
 Hsin-Ping Chen, Pei-Wen Yeh,
 and Sin-Ya Peng Research Center for
 Information Technology Innovation,
 Academia Sinica Department of Computer Science, National Chengchi University linearly combination of several models the proposed data engineering method it’s able to generate a bunch of distinct feature sets
  • 3. Key Point Summary Latent Space Representation
 — Clustering Model
 — Skip-Gram Model Backward Cumulative Features
 — Generate 30 distinct sets of features Linear Model
 +
 Tree-based Model alleviate the feature sparsity problem alleviate the bias problem of statistical feature good match weakness when using sparse feature
  • 4. Workflow Train Data (75%) Validate Data (25%) Train Data 0" 50000" 100000" 150000" 200000" 10/27/2013" 11/27/2013" 12/27/2013"1/27/2014"2/27/2014" 3/27/2014"4/27/2014" 5/27/2014"6/27/2014" 7/27/2014" Training'Date'Distribu.on 0" 20000" 40000" 60000" 80000" 100000" 120000" 140000" 10/27/2013" 11/27/2013" 12/27/2013"1/27/2014"2/27/2014" 3/27/2014"4/27/2014" 5/27/2014"6/27/2014" 7/27/2014" Tes.ng'Date'Distribu.on Split the training data based on
 the time distribution.
 stable results — 2 settings
  • 5. Workflow Train Data (75%) Validate Data (25%) Learned
 Model Offline Evaluation Test Data Train Data Submission method 1 cross-validation — 2 settings check if it leads to better performance
  • 6. Workflow Train Data (75%) Validate Data (25%) Learned
 Model Offline Evaluation Test Data Train Data Learned
 Model Submission method 2 — 2 settings check if it leads to better performance
  • 7. Workflow Train Data (75%) Validate Data (25%) Learned
 Model Offline Evaluation Test Data Train Data Learned
 Model Submission — 2 settings
  • 8. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Raw
 Data Support Vector
 Classifier Student Course Time A classical approach to a general prediction task. — 2 solutions Features
  • 9. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation the feature engineering towards the MOOC dataset. — 2 solutions Features
  • 10. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation — 2 solutions Features
  • 11. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation solution 1 solution 2 xgboost scikit-learn — 2 solutions http://scikit-learn.org/stable/https://github.com/dmlc/xgboost
  • 12. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation Feature Extraction / Feature Engineering — 2 solutions
  • 13. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data
  • 14. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data describing the status e.g.
 the month of the course
 the number of registration video 5 problem 10 wiki 0 discussion 2 navigation 0
  • 15. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data dropout probability P( dropout|containing objects ) := P(O1|dropout) … P(Od|dropout) O = {O1, O2, …, Od}objects: e.g.
 the dropout ratio of the course estimate the probability from observed data
  • 16. Feature Extraction Student Course Time Enrolment ID • Bag-of-words Feature
 — Boolean (0/1)
 — Term Frequency (TF) • Probability Value
 — Ratio
 — Naive Bayes • Latent Space Feature
 — Clustering
 — DeepWalk Raw
 Data Latent Topic K-means Clustering on
 1. registered courses
 2. containing objects some features are sparse DeepWalk / Skip-Gram
 for obtaining a dense
 feature representation

  • 17. DeepWalk https://github.com/phanein/deepwalk The Goal — Find the representation of each node of a graph. It’s an extension work of word2vec’s Skip-Gram model.
  • 18. DeepWalk https://github.com/phanein/deepwalk The Goal — Find the representation of each node of a graph. It’s an extension work of word2vec’s Skip-Gram model. The core is to model the context information.
 (in practical, the node’s neighbours) Similar objects are mapped into similar space.
  • 19. From DeepWalk to the MOOC Problem U1 U2 U3 Course A Course B Course C Course D Course E https://github.com/phanein/deepwalk
  • 20. From DeepWalk to the MOOC Problem U1 U2 U3 Course A Course B Course C Course D U1 Course B Course E Course C U2 U1 Random Walk https://github.com/phanein/deepwalk Treat Random Walks on heterogeneous graph
 as the sentence.
  • 21. From DeepWalk to the MOOC Problem U1 U2 U3 Course A Course B Course C Course D U1 Course B Course E Course C U2 U1 Random Walk https://github.com/phanein/deepwalk Treat Random Walks on heterogeneous graph
 as the sentence. U1 0.3 0.2 -0.1 0.5 -0.8 Course B 0.1 0.3 -0.5 1.2 -0.3
  • 22. Performance Bag-of-words Bag-of-words Probability Bag-of-words Probability Naive Bayes Bag-of-words Probability Naive Bayes Latent Space > 0.890 > 0.901 > 0.902 > 0.903 Backward Cumulation Models Combination
  • 23. Backward Cumulative Features — Motivation O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Logs Table 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  • 24. Backward Cumulative Features — Motivation O X O X O O X O O X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X different period Logs Table different number of logs 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  • 25. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=2 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  • 26. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=3 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  • 27. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=4 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  • 28. Backward Cumulative Features O X O X O O X O X X X X X X O X X X X O X X X X X O O O X X X X X X X X X X X X X O X O X Consider only the logs in last N days. N=5 10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 U1 U2 U3
  • 29. Backward Cumulative Features Raw
 Data Student Course Time Backward Cumulation . . . Feature Set 1 N=1 N=2 N=3 N=29 N=30 Feature Set 2 Feature Set 3 . . . Feature Set 29 Feature Set 30 — 2 strategies
  • 30. Backward Cumulative Features Raw
 Data Student Course Time Backward Cumulation . . . Feature Set 1 N=1 N=2 N=3 N=29 N=30 Feature Set 2 Feature Set 3 . . . Feature Set 29 Feature Set 30 Classifier Strategy 1.
 Concatenate all features. — 2 strategies
  • 31. Backward Cumulative Features Raw
 Data Student Course Time Backward Cumulation . . . Feature Set 1 N=1 N=2 N=3 N=29 N=30 Feature Set 2 Feature Set 3 . . . Feature Set 29 Feature Set 30 Classifier Classifier Classifier Classifier Classifier Strategy 2.
 Build 30 distinct models. Average — 2 strategies
  • 32. Prediction Model Overview Logistic Regression Gradient Boosting Classifier Gradient Boosting Decision Trees Linear
 Combination Final
 Prediction Raw
 Data Support Vector
 Classifier Student Course Time Backward Cumulation solution 1 * 0.5 solution 2 * 0.5 xgboost scikit-learn
  • 33. What We Learned from the Competition • Team Work is important
 — share ideas
 — share solutions • Model diversity & feature diversity
 — diverse models / features can capture different characteristic of the data • Realize the data
 — the goal
 — the evaluation metric
 — the data structure
  • 34. What We Learned from the Competition • Team Work is important
 — share ideas
 — share solutions • Model diversity & feature diversity
 — diverse models / features can capture different characteristic of the data • Realize the data
 — the goal
 — the evaluation metric
 — the data structure Start earlier …
  • 35. What We Learned from the Competition • Team Work is important
 — share ideas
 — share solutions • Model diversity & feature diversity
 — diverse models / features can capture different characteristic of the data • Realize the data
 — the goal
 — the evaluation metric
 — the data structure Start earlier … Feature Format Data Partition Feature Scale several things to be discussed e.g.