An Overview of Adaptive Boosting – AdaBoost
Presented By
Kato Mivule
Dr. Manohar Mareboyana, Professor
Data Mining - Spring 2013
Computer Science Department
Bowie State University
An Overview of AdaBoost
1
OUTLINE
• Introduction
• How AdaBoost Works
• The experiment
• Results
• Conclusion and Discussion
2
An Overview of AdaBoost
Adaptive Booting – AdaBoost
• Adaptive Boosting (AdaBoost) was proposed by Freund and Schapire (1995).
• AdaBoost is a machine learning classifier that uses several iterations by adding
weak learners to generate a new learner with improved performance.
• AdaBoost is adaptive in that with each iteration, a new weak learner is added
to the AdaBoost classifier by fine-tuning weights with priority given to
misclassified data in prior iterations.
• AdaBoost is less vulnerable to over-fitting but prone to noise and outliers.
3
An Overview of AdaBoost
AdaBoost Fit Ensemble
4
An overview of the AdaBoost Fit Ensemble procedure.
An Overview of AdaBoost
5
An Overview of AdaBoost
6
An Overview of AdaBoost
7
An Overview of AdaBoost
8
An Overview of AdaBoost
How AdaBoost Works – Weak Learners
• Decision Stump
• For this overview, we choose Decision Stumps as our weak learner.
• The Decision Stump generates a decision tree with only one single split.
• The resulting tree can be used for classifying unseen (untrained) instances.
• The leaf nodes is the class name.
• A non-leaf node is a decision node.
9
An Overview of AdaBoost
How AdaBoost Works – Weak Learners
• How a Decision Stump chooses the best attributes:
• Information gain: attribute with lowest info gain is chosen.
• Gain ratio.
• Gini index.
10
An Overview of AdaBoost
AdaBoost – the experiment
• For illustration purposes, we utilized Rapid Miner’s AdaBoost functionality
• We used a UCI Cancer dataset with 643 data points.
• We employed a 10 fold cross validation.
11
An Overview of AdaBoost
AdaBoost – the experiment
• We used Rapid Miner’s Decision Stump as our weak learner.
12
An Overview of AdaBoost
AdaBoost – Results
13
Generated AdaBoost Model
The following AdaBoost Model was generated:
AdaBoost (prediction model for label Class)
Number of inner models: 3
Embedded model #0 (weight: 2.582):
Uniformity of Cell > 3.500: 4 {2=11, 4=202}
Uniformity of Cell ≤ 3.500: 2 {2=433, 4=37}
Embedded model #1 (weight: 1.352):
Uniformity of Cell Shape > 1.500: 4 {2=100, 4=237}
Uniformity of Cell Shape ≤ 1.500: 2 {2=344, 4=2}
Embedded model #2 (weight: 1.016):
Clump Thickness > 8.500: 4 {2=0, 4=83}
Clump Thickness ≤ 8.500: 2 {2=444, 4=156}
An Overview of AdaBoost
AdaBoost – Results
• AdaBoost using Decision Stumps – classification accuracy at 93.12%.
• Decision Stump with out AdaBoost – classification accuracy at 92.97%.
14
An Overview of AdaBoost
AdaBoost – the results
• AdaBoost Confusion Matrix – Classification accuracy at 93.12%
• Decision Stump Confusion Matrix – Classification accuracy at 92.97%
15
An Overview of AdaBoost
AdaBoost – the results
16
An Overview of AdaBoost
The Receiver Operating Characteristic
(ROC):
•The ROC shows the false positive rate on X-
axis (specificity), the probability of target = 1
when its true value is 0.
•The true positive rate on Y-axis (sensitivity),
the probability of target=1 when its true value
is 1.
•For an ideal situation, the curve rises fast to
the top-left indicating that the model correctly
made the predictions.
Area Under the Curve (AUC):
•AUC shows how the classifier will rank a
randomly chosen positive instance higher
than a randomly chosen negative instance.
•The AUC calculates total performance of
classifier.
•Higher AUC indicates better performance.
•0.50 AUC indicates random performance.
•1.00 AUC indicates perfect performance
weight:
The ROC/AUC plot for AdaBoost – with AUC of 0.975.
The ROC/AUC plot for Decision Stamp – with AUC of 0.911.
17
CONCLUSION
• As shown in the preliminary results, AdaBoost performs better
than Decision Stump.
• However, much of the success for the AdaBoost will depend
largely on fine-tuning parameters in the machine learning
classifier and the weak learner that is chosen.
An Overview of AdaBoost
References
1. Y. Freund and R. E. Schapire, "A Decision-Theoretic generalization of On-Line learning and an application to boosting," Journal of
Computer and System Sciences, vol. 55, no. 1, pp. 119-139, Aug. 1997.
2. T. G. Dietterich, "Ensemble methods in machine learning," Lecture Notes in Computer Science, vol. 1857, pp. 1-15, 2000.
3. Kato Mivule, Claude Turner, Soo-Yeon Ji, Towards A Differential Privacy and Utility Preserving Machine Learning Classifier, Procedia
Computer Science, Volume 12, 2012, Pages 176-181
4. T. Fawcett, “An introduction to ROC analysis.”, Pattern recognition letters, vol. 27, no.8, 2006, Pages 861-874.
5. K. Bache and M. Lichman, “Breast Cancer Wisconsin (Original) Data Set - UCI Machine Learning Repository.” University of
California, School of Information and Computer Science., Irvine, CA, 2013.
6. MATLAB, “AdaBoost - MATLAB.” Online, Accessed: May 3rd 2013, Available: http://www.mathworks.com/discovery/adaboost.html.
7. MATLAB, “Ensemble Methods :: Nonparametric Supervised Learning (Statistics Toolbox™).” Online, Accessed: May 3rd 2013,
Available: http://www.mathworks.com/help/toolbox/stats/bsvjye9.html#bsvjyi5.
8. ROC Charts, “Model Evaluation – Classification” Online, Accessed May 3rd 2013, Available: http://chem-
eng.utoronto.ca/~datamining/dmc/model_evaluation_c.htm
9. MedCalc, “ROC curve analysis in MedCalc”, Online, Accessed May 3rd 2013, Available: http://www.medcalc.org/manual/roc-
curves.php
18
An Overview of AdaBoost
THANK YOU.
Contact: kmivule at gmail dot com
19
An Overview of AdaBoost

Kato Mivule: An Overview of Adaptive Boosting – AdaBoost

  • 1.
    An Overview ofAdaptive Boosting – AdaBoost Presented By Kato Mivule Dr. Manohar Mareboyana, Professor Data Mining - Spring 2013 Computer Science Department Bowie State University An Overview of AdaBoost 1
  • 2.
    OUTLINE • Introduction • HowAdaBoost Works • The experiment • Results • Conclusion and Discussion 2 An Overview of AdaBoost
  • 3.
    Adaptive Booting –AdaBoost • Adaptive Boosting (AdaBoost) was proposed by Freund and Schapire (1995). • AdaBoost is a machine learning classifier that uses several iterations by adding weak learners to generate a new learner with improved performance. • AdaBoost is adaptive in that with each iteration, a new weak learner is added to the AdaBoost classifier by fine-tuning weights with priority given to misclassified data in prior iterations. • AdaBoost is less vulnerable to over-fitting but prone to noise and outliers. 3 An Overview of AdaBoost
  • 4.
    AdaBoost Fit Ensemble 4 Anoverview of the AdaBoost Fit Ensemble procedure. An Overview of AdaBoost
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    How AdaBoost Works– Weak Learners • Decision Stump • For this overview, we choose Decision Stumps as our weak learner. • The Decision Stump generates a decision tree with only one single split. • The resulting tree can be used for classifying unseen (untrained) instances. • The leaf nodes is the class name. • A non-leaf node is a decision node. 9 An Overview of AdaBoost
  • 10.
    How AdaBoost Works– Weak Learners • How a Decision Stump chooses the best attributes: • Information gain: attribute with lowest info gain is chosen. • Gain ratio. • Gini index. 10 An Overview of AdaBoost
  • 11.
    AdaBoost – theexperiment • For illustration purposes, we utilized Rapid Miner’s AdaBoost functionality • We used a UCI Cancer dataset with 643 data points. • We employed a 10 fold cross validation. 11 An Overview of AdaBoost
  • 12.
    AdaBoost – theexperiment • We used Rapid Miner’s Decision Stump as our weak learner. 12 An Overview of AdaBoost
  • 13.
    AdaBoost – Results 13 GeneratedAdaBoost Model The following AdaBoost Model was generated: AdaBoost (prediction model for label Class) Number of inner models: 3 Embedded model #0 (weight: 2.582): Uniformity of Cell > 3.500: 4 {2=11, 4=202} Uniformity of Cell ≤ 3.500: 2 {2=433, 4=37} Embedded model #1 (weight: 1.352): Uniformity of Cell Shape > 1.500: 4 {2=100, 4=237} Uniformity of Cell Shape ≤ 1.500: 2 {2=344, 4=2} Embedded model #2 (weight: 1.016): Clump Thickness > 8.500: 4 {2=0, 4=83} Clump Thickness ≤ 8.500: 2 {2=444, 4=156} An Overview of AdaBoost
  • 14.
    AdaBoost – Results •AdaBoost using Decision Stumps – classification accuracy at 93.12%. • Decision Stump with out AdaBoost – classification accuracy at 92.97%. 14 An Overview of AdaBoost
  • 15.
    AdaBoost – theresults • AdaBoost Confusion Matrix – Classification accuracy at 93.12% • Decision Stump Confusion Matrix – Classification accuracy at 92.97% 15 An Overview of AdaBoost
  • 16.
    AdaBoost – theresults 16 An Overview of AdaBoost The Receiver Operating Characteristic (ROC): •The ROC shows the false positive rate on X- axis (specificity), the probability of target = 1 when its true value is 0. •The true positive rate on Y-axis (sensitivity), the probability of target=1 when its true value is 1. •For an ideal situation, the curve rises fast to the top-left indicating that the model correctly made the predictions. Area Under the Curve (AUC): •AUC shows how the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. •The AUC calculates total performance of classifier. •Higher AUC indicates better performance. •0.50 AUC indicates random performance. •1.00 AUC indicates perfect performance weight: The ROC/AUC plot for AdaBoost – with AUC of 0.975. The ROC/AUC plot for Decision Stamp – with AUC of 0.911.
  • 17.
    17 CONCLUSION • As shownin the preliminary results, AdaBoost performs better than Decision Stump. • However, much of the success for the AdaBoost will depend largely on fine-tuning parameters in the machine learning classifier and the weak learner that is chosen. An Overview of AdaBoost
  • 18.
    References 1. Y. Freundand R. E. Schapire, "A Decision-Theoretic generalization of On-Line learning and an application to boosting," Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119-139, Aug. 1997. 2. T. G. Dietterich, "Ensemble methods in machine learning," Lecture Notes in Computer Science, vol. 1857, pp. 1-15, 2000. 3. Kato Mivule, Claude Turner, Soo-Yeon Ji, Towards A Differential Privacy and Utility Preserving Machine Learning Classifier, Procedia Computer Science, Volume 12, 2012, Pages 176-181 4. T. Fawcett, “An introduction to ROC analysis.”, Pattern recognition letters, vol. 27, no.8, 2006, Pages 861-874. 5. K. Bache and M. Lichman, “Breast Cancer Wisconsin (Original) Data Set - UCI Machine Learning Repository.” University of California, School of Information and Computer Science., Irvine, CA, 2013. 6. MATLAB, “AdaBoost - MATLAB.” Online, Accessed: May 3rd 2013, Available: http://www.mathworks.com/discovery/adaboost.html. 7. MATLAB, “Ensemble Methods :: Nonparametric Supervised Learning (Statistics Toolbox™).” Online, Accessed: May 3rd 2013, Available: http://www.mathworks.com/help/toolbox/stats/bsvjye9.html#bsvjyi5. 8. ROC Charts, “Model Evaluation – Classification” Online, Accessed May 3rd 2013, Available: http://chem- eng.utoronto.ca/~datamining/dmc/model_evaluation_c.htm 9. MedCalc, “ROC curve analysis in MedCalc”, Online, Accessed May 3rd 2013, Available: http://www.medcalc.org/manual/roc- curves.php 18 An Overview of AdaBoost
  • 19.
    THANK YOU. Contact: kmivuleat gmail dot com 19 An Overview of AdaBoost