Improving the Model’s Predictive Power with Ensemble Approaches

4,993 views

Published on

Bagus Sartono, Lecture at Department of Statistics, Institut Pertanian Bogor (IPB) University,
New Trends in Research Methodoloy & Analytics Technology Update, Nov 28, 2012, Jakarta Indonesia

Published in: Technology
  • Be the first to comment

Improving the Model’s Predictive Power with Ensemble Approaches

  1. 1. bagusco@gmail.combagusco@ipb.ac.id
  2. 2. KDD Cup 2010: Overview• The Challenge – How generally or narrowly do students learn? How quickly or slowly? Will the rate of improvement vary between students? What does it mean for one problem to be similar to another? – Is it possible to infer the knowledge requirements of problems directly from student performance data, without human analysis of the tasks? – This years challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems.
  3. 3. KDD Cup 2010: Results• Winners of KDD Cup 2010: All Teams – First Place: National Taiwan University Feature engineering and classifier ensembling for KDD CUP 2010 – First Runner Up: Zhang and Su Gradient Boosting Machines with Singular Value Decomposition – Second Runner Up: BigChaos @ KDD Collaborative Filtering Applied to Educational Data Mining
  4. 4. Outline• What is Ensemble Learning?• Why Ensemble?• How good is Ensemble?• What next?
  5. 5. Predictive Modeling• Widely-used in many applications: – Business • Churn modeling, Scoring – Science • Chemometrics – Bio-Science • Efficacy modeling, Classification – Academics • Admission selection, student performance
  6. 6. Predictive Modeling New Data SetTraining Model Predictive Prediction Set Development Rules
  7. 7. Classical Approach: Model Selection Which one is the best?
  8. 8. New Approach?: Ensemble Combine all models!!!
  9. 9. What is Ensemble?• Single Expert vs Team of Experts
  10. 10. What is Ensemble? Data Set Training Set #1 Training Set #2 …… Training Set #k . Learning Learning Learning …… Model #1 Model #2 Model #k . Combiner Ensemble Prediction
  11. 11. Types of Ensemble• Hybrid Ensemble – Combining several different learning algorithms into one prediction – e.g: combining the result of regression, tree, neural nets, and support vector machine• Non-Hybrid Ensemble – Combining several learning models from the same algorithm into one prediction
  12. 12. Well-Known Ensembles• Bagging – Generate learning models for the bootstrap samples – Aggregate the predictions via averaging or majority-vote• Boosting (AdaBoost) – Generate sequential learning models with higher weight to ‘difficult’ cases – Combine the predictions by concerning the weight• Random Forest – Similar to bagging except the existence of random feature selection for each learning model generation
  13. 13. How Good is Ensemble?Error Rate 0.7 tree 0.6 bagging 0.5 adaboost 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Source: Dietterich (1999)
  14. 14. How Good is Ensemble?AUC 0.9 0.8 CART C45 0.7 Bagging 0.6 Random Forest Rotation Forest 0.5 Rotation Boost 0.4 DIY Bank Telecom1 Mail-orderSource: Bock & Poel (2011)
  15. 15. What Next• Ensemble Predictive Models• Class-Imbalance Models – Gradient Boosting, EasyEnsemble, BalanceCascade, SMOTE Boost• Robust Predictive Models – Noise Ensemble
  16. 16. Ensemble in SAS/EM
  17. 17. THANK YOU
  18. 18. Bagus SartonoEducational Background Professional Experience• Bachelor of Science in • Lecturer – Dept of Stats Stats – IPB (2000) IPB• Master of Science in • Experienced Trainer in Stats – IPB (2004) Analytics (Bank• PhD in Applied Indonesia, Bank Economics – University of Mandiri, Ganesha Cipta Antwerp (2012) Informatika, CIFOR, LIPI, LPEM-UI, etc)

×