bagusco@gmail.com
bagusco@ipb.ac.id
KDD Cup 2010: Overview
• The Challenge
   – How generally or narrowly do students learn? How quickly or
     slowly? Will the rate of improvement vary between students?
     What does it mean for one problem to be similar to another?

   – Is it possible to infer the knowledge requirements of problems
     directly from student performance data, without human analysis
     of the tasks?

   – This year's challenge asks you to predict student performance
     on mathematical problems from logs of student interaction with
     Intelligent Tutoring Systems.
KDD Cup 2010: Results
• Winners of KDD Cup 2010: All Teams
   – First Place: National Taiwan University
     Feature engineering and classifier ensembling for KDD CUP
     2010

   – First Runner Up: Zhang and Su
     Gradient Boosting Machines with Singular Value Decomposition

   – Second Runner Up: BigChaos @ KDD
     Collaborative Filtering Applied to Educational Data Mining
Outline
•   What is Ensemble Learning?
•   Why Ensemble?
•   How good is Ensemble?
•   What next?
Predictive Modeling
• Widely-used in many applications:
  – Business
     • Churn modeling, Scoring
  – Science
     • Chemometrics
  – Bio-Science
     • Efficacy modeling, Classification
  – Academics
     • Admission selection, student performance
Predictive Modeling
                          New
                         Data Set


Training      Model      Predictive   Prediction
  Set      Development     Rules
Classical Approach: Model Selection




   Which one is the best?
New Approach?: Ensemble




  Combine all models!!!
What is Ensemble?
• Single Expert   vs   Team of Experts
What is Ensemble?
                        Data Set



   Training Set #1   Training Set #2   ……   Training Set #k
                                       .

     Learning           Learning              Learning
                                       ……
     Model #1           Model #2              Model #k
                                       .

                        Combiner



                        Ensemble
                        Prediction
Types of Ensemble
• Hybrid Ensemble
  – Combining several different learning algorithms into
    one prediction
  – e.g: combining the result of regression, tree, neural
    nets, and support vector machine

• Non-Hybrid Ensemble
  – Combining several learning models from the same
    algorithm into one prediction
Well-Known Ensembles
• Bagging
  – Generate learning models for the bootstrap samples
  – Aggregate the predictions via averaging or majority-vote
• Boosting (AdaBoost)
  – Generate sequential learning models with higher weight to
    ‘difficult’ cases
  – Combine the predictions by concerning the weight
• Random Forest
  – Similar to bagging except the existence of random feature
    selection for each learning model generation
How Good is Ensemble?
Error Rate
 0.7
                           tree
 0.6
                           bagging
 0.5                       adaboost
 0.4
 0.3
 0.2
 0.1
  0
       1   2   3   4   5   6   7   8   9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33


  Source: Dietterich (1999)
How Good is Ensemble?
AUC
 0.9

 0.8                                                   CART
                                                       C45
 0.7
                                                       Bagging
 0.6                                                   Random Forest
                                                       Rotation Forest
 0.5
                                                       Rotation Boost
 0.4
           DIY          Bank   Telecom1   Mail-order
Source: Bock & Poel (2011)
What Next
• Ensemble Predictive Models

• Class-Imbalance Models
  – Gradient
    Boosting, EasyEnsemble, BalanceCascade, SMOTE
    Boost

• Robust Predictive Models
  – Noise Ensemble
Ensemble in SAS/EM
THANK YOU
Bagus Sartono
Educational Background        Professional Experience
• Bachelor of Science in      • Lecturer – Dept of Stats
  Stats – IPB (2000)            IPB
• Master of Science in        • Experienced Trainer in
  Stats – IPB (2004)            Analytics (Bank
• PhD in Applied                Indonesia, Bank
  Economics – University of     Mandiri, Ganesha Cipta
  Antwerp (2012)                Informatika, CIFOR, LIPI,
                                LPEM-UI, etc)

Improving the Model’s Predictive Power with Ensemble Approaches

  • 1.
  • 2.
    KDD Cup 2010:Overview • The Challenge – How generally or narrowly do students learn? How quickly or slowly? Will the rate of improvement vary between students? What does it mean for one problem to be similar to another? – Is it possible to infer the knowledge requirements of problems directly from student performance data, without human analysis of the tasks? – This year's challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems.
  • 3.
    KDD Cup 2010:Results • Winners of KDD Cup 2010: All Teams – First Place: National Taiwan University Feature engineering and classifier ensembling for KDD CUP 2010 – First Runner Up: Zhang and Su Gradient Boosting Machines with Singular Value Decomposition – Second Runner Up: BigChaos @ KDD Collaborative Filtering Applied to Educational Data Mining
  • 4.
    Outline • What is Ensemble Learning? • Why Ensemble? • How good is Ensemble? • What next?
  • 5.
    Predictive Modeling • Widely-usedin many applications: – Business • Churn modeling, Scoring – Science • Chemometrics – Bio-Science • Efficacy modeling, Classification – Academics • Admission selection, student performance
  • 6.
    Predictive Modeling New Data Set Training Model Predictive Prediction Set Development Rules
  • 7.
    Classical Approach: ModelSelection Which one is the best?
  • 8.
    New Approach?: Ensemble Combine all models!!!
  • 9.
    What is Ensemble? •Single Expert vs Team of Experts
  • 10.
    What is Ensemble? Data Set Training Set #1 Training Set #2 …… Training Set #k . Learning Learning Learning …… Model #1 Model #2 Model #k . Combiner Ensemble Prediction
  • 11.
    Types of Ensemble •Hybrid Ensemble – Combining several different learning algorithms into one prediction – e.g: combining the result of regression, tree, neural nets, and support vector machine • Non-Hybrid Ensemble – Combining several learning models from the same algorithm into one prediction
  • 12.
    Well-Known Ensembles • Bagging – Generate learning models for the bootstrap samples – Aggregate the predictions via averaging or majority-vote • Boosting (AdaBoost) – Generate sequential learning models with higher weight to ‘difficult’ cases – Combine the predictions by concerning the weight • Random Forest – Similar to bagging except the existence of random feature selection for each learning model generation
  • 13.
    How Good isEnsemble? Error Rate 0.7 tree 0.6 bagging 0.5 adaboost 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Source: Dietterich (1999)
  • 14.
    How Good isEnsemble? AUC 0.9 0.8 CART C45 0.7 Bagging 0.6 Random Forest Rotation Forest 0.5 Rotation Boost 0.4 DIY Bank Telecom1 Mail-order Source: Bock & Poel (2011)
  • 15.
    What Next • EnsemblePredictive Models • Class-Imbalance Models – Gradient Boosting, EasyEnsemble, BalanceCascade, SMOTE Boost • Robust Predictive Models – Noise Ensemble
  • 16.
  • 17.
  • 18.
    Bagus Sartono Educational Background Professional Experience • Bachelor of Science in • Lecturer – Dept of Stats Stats – IPB (2000) IPB • Master of Science in • Experienced Trainer in Stats – IPB (2004) Analytics (Bank • PhD in Applied Indonesia, Bank Economics – University of Mandiri, Ganesha Cipta Antwerp (2012) Informatika, CIFOR, LIPI, LPEM-UI, etc)