bagusco@gmail.combagusco@ipb.ac.id
KDD Cup 2010: Overview• The Challenge   – How generally or narrowly do students learn? How quickly or     slowly? Will the...
KDD Cup 2010: Results• Winners of KDD Cup 2010: All Teams   – First Place: National Taiwan University     Feature engineer...
Outline•   What is Ensemble Learning?•   Why Ensemble?•   How good is Ensemble?•   What next?
Predictive Modeling• Widely-used in many applications:  – Business     • Churn modeling, Scoring  – Science     • Chemomet...
Predictive Modeling                          New                         Data SetTraining      Model      Predictive   Pre...
Classical Approach: Model Selection   Which one is the best?
New Approach?: Ensemble  Combine all models!!!
What is Ensemble?• Single Expert   vs   Team of Experts
What is Ensemble?                        Data Set   Training Set #1   Training Set #2   ……   Training Set #k              ...
Types of Ensemble• Hybrid Ensemble  – Combining several different learning algorithms into    one prediction  – e.g: combi...
Well-Known Ensembles• Bagging  – Generate learning models for the bootstrap samples  – Aggregate the predictions via avera...
How Good is Ensemble?Error Rate 0.7                           tree 0.6                           bagging 0.5              ...
How Good is Ensemble?AUC 0.9 0.8                                                   CART                                   ...
What Next• Ensemble Predictive Models• Class-Imbalance Models  – Gradient    Boosting, EasyEnsemble, BalanceCascade, SMOTE...
Ensemble in SAS/EM
THANK YOU
Bagus SartonoEducational Background        Professional Experience• Bachelor of Science in      • Lecturer – Dept of Stats...
Upcoming SlideShare
Loading in...5
×

Improving the Model’s Predictive Power with Ensemble Approaches

3,517

Published on

Bagus Sartono, Lecture at Department of Statistics, Institut Pertanian Bogor (IPB) University,
New Trends in Research Methodoloy & Analytics Technology Update, Nov 28, 2012, Jakarta Indonesia

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,517
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Improving the Model’s Predictive Power with Ensemble Approaches"

  1. 1. bagusco@gmail.combagusco@ipb.ac.id
  2. 2. KDD Cup 2010: Overview• The Challenge – How generally or narrowly do students learn? How quickly or slowly? Will the rate of improvement vary between students? What does it mean for one problem to be similar to another? – Is it possible to infer the knowledge requirements of problems directly from student performance data, without human analysis of the tasks? – This years challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems.
  3. 3. KDD Cup 2010: Results• Winners of KDD Cup 2010: All Teams – First Place: National Taiwan University Feature engineering and classifier ensembling for KDD CUP 2010 – First Runner Up: Zhang and Su Gradient Boosting Machines with Singular Value Decomposition – Second Runner Up: BigChaos @ KDD Collaborative Filtering Applied to Educational Data Mining
  4. 4. Outline• What is Ensemble Learning?• Why Ensemble?• How good is Ensemble?• What next?
  5. 5. Predictive Modeling• Widely-used in many applications: – Business • Churn modeling, Scoring – Science • Chemometrics – Bio-Science • Efficacy modeling, Classification – Academics • Admission selection, student performance
  6. 6. Predictive Modeling New Data SetTraining Model Predictive Prediction Set Development Rules
  7. 7. Classical Approach: Model Selection Which one is the best?
  8. 8. New Approach?: Ensemble Combine all models!!!
  9. 9. What is Ensemble?• Single Expert vs Team of Experts
  10. 10. What is Ensemble? Data Set Training Set #1 Training Set #2 …… Training Set #k . Learning Learning Learning …… Model #1 Model #2 Model #k . Combiner Ensemble Prediction
  11. 11. Types of Ensemble• Hybrid Ensemble – Combining several different learning algorithms into one prediction – e.g: combining the result of regression, tree, neural nets, and support vector machine• Non-Hybrid Ensemble – Combining several learning models from the same algorithm into one prediction
  12. 12. Well-Known Ensembles• Bagging – Generate learning models for the bootstrap samples – Aggregate the predictions via averaging or majority-vote• Boosting (AdaBoost) – Generate sequential learning models with higher weight to ‘difficult’ cases – Combine the predictions by concerning the weight• Random Forest – Similar to bagging except the existence of random feature selection for each learning model generation
  13. 13. How Good is Ensemble?Error Rate 0.7 tree 0.6 bagging 0.5 adaboost 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Source: Dietterich (1999)
  14. 14. How Good is Ensemble?AUC 0.9 0.8 CART C45 0.7 Bagging 0.6 Random Forest Rotation Forest 0.5 Rotation Boost 0.4 DIY Bank Telecom1 Mail-orderSource: Bock & Poel (2011)
  15. 15. What Next• Ensemble Predictive Models• Class-Imbalance Models – Gradient Boosting, EasyEnsemble, BalanceCascade, SMOTE Boost• Robust Predictive Models – Noise Ensemble
  16. 16. Ensemble in SAS/EM
  17. 17. THANK YOU
  18. 18. Bagus SartonoEducational Background Professional Experience• Bachelor of Science in • Lecturer – Dept of Stats Stats – IPB (2000) IPB• Master of Science in • Experienced Trainer in Stats – IPB (2004) Analytics (Bank• PhD in Applied Indonesia, Bank Economics – University of Mandiri, Ganesha Cipta Antwerp (2012) Informatika, CIFOR, LIPI, LPEM-UI, etc)

×