Benefit of Ensemble
1.Statistical reasons
• モデルの結合により汎化性能の向上
• 悪い選択リスクの減少
• [Hansen and Salmon, 1990]
• 25 base learners
• Error rates of each classifier, 𝜀 = 0.35
• Ensemble: ∑ 25
𝑖
*+
,-. 𝜀,(1 − 𝜀)*+3,= 0.06
8
9.
Benefit of Ensemble
2.Volumes of data
• Too Large
• 単⼀のモデルで扱いきれない
• データを分割して学習可能
• Too Little
• Resampling
• Model Diversity
https://www.rhipe.com/big-data-and-the-cloud/9
History of EnsembleMethods
• Principle of Multiple Explanations^3
• if several theories are consistent with the observed data, retain
them all
• Ockhamʼs razer^4
• among competing hypotheses, the one with the fewest assumptions
should be selected
• No Free Lunch[Wolpert, 1996][Wolpert and Macready, 1997]
• all algorithms that search for an extremum of a cost function
perform exactly the same, when averaged over all possible cost
functions
3 http://www.gutenberg.org/ebooks/785?msg=welcome_stranger
4 http://plato.stanford.edu/entries/ockham/#4.1 12
13.
History of EnsembleMethods
Pioneering research
1. [Hansen and Salamon, 1990]
• experimental
• 分類器の結合により⾏われる予測は,多く
の場合で最も優れた単⼀の分類器による予
測よりも正しい
2. [Schapire, 1990]
• theorical
• 弱学習器は強学習器に押し上げることが可
能という証明
• Boostingの導⼊
13
14.
History of EmsembleMethods
• 3 threads of early contributions
1. Combination classifiers: Pattern Recognition
• 強分類器について研究し,より強い結合した分類器を得るために強⼒な結合規則の
設計を試みる
2. Ensemble of weak learners: Machine Learning
• 弱学習器から強学習器へ性能を⾼めるためにアルゴリズムを設計することを試みる
• Boosting,Bagging
• 弱学習器が強学習器になるのかという理論的な理解の導⼊
3. Mixture of experts: Neural Network
• 分割統治(devide-and-conquer)構造
• パラメトリックモデルの混合を学習,全体の解を得るために結合規則を⽤いる
14
Ensemble Pruning
• 訓練された個々の学習器
•全てを結合するのではなく,部分集合を選択する
• [Zhou et al., 2002]
• Many could be better than all
• サニティーチェックはアンサンブル枝刈りではない
• Boosting pruning[Margineantu and Ditterich, 1977]
47
48.
Ensemble Pruning
[Tsoumakas etal., 2009]
• アンサンブル枝刈りの分類
• Ranking based
• 評価関数に従ってアンサンブルを⼀度構築し,基本学習器の⼊れ替えを⾏う
• Clustering based
• モデルをクラスタリングし,クラスタ毎に枝刈りを⾏う
• Optimization based
• 汎化性能を⽰す評価指標を最適化する基本学習器の部分集合を発⾒する
48
49.
Ensemble Pruning
• Rankingbased
• [Margineantu and Ditterich, 1977]
• Reduce-error pruning
• Kappa pruning
• Kappa-error diagram pruning
• [Martínez-Muñoz and Suárez, 2004]
• Complementariness pruning
• Margin distance pruning
• [Martínez-Muñoz and Suárez, 2006]
• Orientation pruning
• [Martínez-Muñoz and Suárez, 2007]
• Boosting-based pruning
• [Partalas et al., 2009]
• Reinforcement learning pruning
• Clustering based
• [Giacinto et al., 2000]
• Hierarchical agglomerative clustering
• [Lazarevic and Obradovic, 2001]
• K-mean clustering
• [Bakker and Heskes, 2003]
• Deterministic annealing
• Optimization based
• [Zhou et al., 2002]
• Genetic algorithm
• SDP[Zhang et al., 2006]
• RSE[Li and Zhou, 2009]
• MAP[Chen et al, 2006, 2009]
49
50.
References [Application ofensembles]
• S. Avidan, “Ensemble Tracking .pdf,” 2005.
• J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. K??gl, “Aggregate features and ADABOOST for music classification,” Mach. Learn., vol. 65, no. 2‒3, pp. 473‒484, 2006.
• H. S. K. Chou, “Ensemble Classifier for Protein Fold Pattern Recognition Associate Editor : Keith A Crandall,” Image Process., no. 1, pp. 1‒6, 2006.
• G. Giacinto, R. Perdisci, M. Del Rio, and F. Roli, “Intrusion detection in computer networks by a modular ensemble of one-class classifiers,” Inf. Fusion, vol. 9, no. 1, pp. 69‒82, 2008.
• G. Giacinto, F. Roli, and L. Didaci, “Fusion of multiple classifiers for intrusion detection in computer networks,” Pattern Recognit. Lett., vol. 24, no. 12, pp. 1795‒1803, 2003.
• F. J. H. F. J. Huang, Z. Z. Z. Zhou, H.-J. Z. H.-J. Zhang, and T. C. T. Chen, “Pose invariant face recognition,” Proc. IEEE 2000 Natl. Aerosp. Electron. Conf. NAECON 2000 Eng. Tomorrow Cat
No00CH37093, vol. 6, no. 3, pp. 245‒250, 2000.
• J. Z. Kolter and M. A. Maloof, “Learning to detect malicious executables in the wild,” Proc. 2004 ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD ʼ04, vol. 7, p. 470, 2004.
• S. Z. Li, B. Schölkopf, H. Zhang, Q. Fu, and Y. Cheng, “Kernel machine based learning for multi-view face detection and pose estimation,” Iccv, no. Iccv, pp. 674‒679, 2001.
• S. Panigrahi, A. Kundu, S. Sural, and A. K. Majumdar, “Credit card fraud detection: A fusion approach using Dempster-Shafer theory and Bayesian learning,” Inf. Fusion, vol. 10, no. 4, pp.
354‒363, 2009.
• R. Polikar et al., “An ensemble based data fusion approach for early diagnosis of Alzheimerʼs disease,” Inf. Fusion, vol. 9, no. 1, pp. 83‒95, 2008.
• M. G. Schultz, E. Eskin, E. Zadok, and S. J. Stolfo, “Data mining methods for detection of new malicious executables,” Proceedings. 2001 IEEE Symp. Secur. Privacy, 2001. S&P 2001., pp.
38‒49, 2001.
• P. Viola and M. Jones, “Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade,” Adv. Neural Inf. Process. Syst. 14, 2001.
• P. Viola and M. J. Jones, “Robust Real-time Object Detection,” Int. J. Comput. Vis., no. February, pp. 1‒30, 2001.
• D. West, S. Dellana, and J. Qian, “Neural network ensemble strategies for financial decision applications,” Comput. Oper. Res., vol. 32, no. 10, pp. 2543‒2559, 2005.
• Z.-H. Zhou, Y. Jiang, Y.-B. Yang, and S.-F. Chen, “Lung cancer cell identification based on artificial neural network ensembles,” Artif. Intell. Med., vol. 24, no. 1, pp. 25‒36, 2002.
50
51.
References
• K. Ali,“A comparison of methods for learning and combining evidence from multiple models,” Tech. Rep. 95-47, 1995.
• K. M. Ali and M. J. Pazzani, “Error reduction through learning multiple descriptions,” Mach. Learn., vol. 202, pp. 173‒202, 1996.
• E. L. Allwein, R. Schapire, and Y. Singer, “Reducing multiclass to binary: A unifying approach for margin classifiers,” J. Mach. Learn. …, vol. 1, pp. 113‒141, 2001.
• J. a. Aslam and S. E. Decatur, “General bounds on statistical query learning and PAC learning withnnoise via hypothesis boosting,” Proc. 1993 IEEE 34th Annu. Found. Comput. Sci., vol. 118, pp. 85‒118, 1993.
• B. Bakker and T. Heskes, “Clustering ensembles of neural network models,” Neural Networks, vol. 16, no. 2, pp. 261‒269, 2003.
• E. Bauer, R. Kohavi, P. Chan, S. Stolfo, and D. Wolpert, “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants,” Mach. Learn., vol. 36, no. August, pp. 105‒139, 1999.
• J. K. Bradley and R. Schapire, “Filterboost: Regression and classification on large datasets,” Adv. Neural Inf. Process. Syst., vol. 20, no. 1997, pp. 185‒192, 2008.
• L. Breiman, “Bagging predictors: Technical Report No. 421,” Mach. Learn., vol. 140, no. 2, p. 19, 1994.
• L. Breiman, “Stacked regressions,” Mach. Learn., vol. 24, no. 1, pp. 49‒64, 1996.
• L. Breiman, “Out-of-Bag Estimation,” Tech. Rep., pp. 1‒13, 1996.
• L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5‒32, 2001.
• P. Bühlmann and B. Yu, “Boosting With the L 2 Loss,” J. Am. Stat. Assoc., vol. 98, no. 462, pp. 324‒339, 2003.
• H. Chen, P. Tino, and X. Yao, “A Probabilistic Ensemble Pruning Algorithm,” Sixth IEEE Int. Conf. Data Min. - Work., no. 1, pp. 878‒882, 2006.
• H. Chen, P. Tiňo, and X. Yao, “Predictive ensemble pruning by expectation propagation,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 7, pp. 999‒1013, 2009.
• A. Cutler and G. Zhao, “PERT - Perfect Random Tree Ensembles,” Comput. Sci. Stat., vol. 33, pp. 490‒497, 2001.
• A. Demiriz et al., “Linear Programming Boosting via Column Generation,” pp. 1‒22, 2000.
• T. G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Mach. Learn., vol. 40, no. 2, pp. 139‒157, 2000.
• T. G. Dietterich, “Ensemble Methods in Machine Learning,” Mult. Classif. Syst., vol. 1857, pp. 1‒15, 2000.
• T. G. Dietterich and G. Bakiri, “Solving Multiclass Learning Problems via Error-Correcting Output Codes,” Jouranal Artifical Intell. Res., vol. 2, pp. 263‒286, 1995.
• C. Domingo and O. Watanabe, “MadaBoost: A Modification of AdaBoost,” Conf. Comput. Learn. Theory, pp. 180‒189, 2000.
51
52.
References
• P. Domingos,“Bayesian Averaging of Classifiers and the Overfitting Problem,” Icml, pp. 223‒230, 2000.
• Y. Freund, “Boosting a Weak Learning Algorithm by Majority,” Information and Computation, vol. 121, no. 2. pp. 256‒285, 1995.
• Y. Freund, “Data Filtering and Distribution Modeling Algorithms for Machine Learning,” no. September, 1993.
• Y. Freund, “An adaptive version of the boost by majority algorithm,” Mach. Learn., vol. 43, no. 3, pp. 293‒318, 2001.
• Y. Freund, “A more robust boosting algorithm,” Mach. Learn., vol. arXiv:0905, pp. 1‒9, 2009.
• Y. Freund and R. E. Schapire, “A desicion-theoretic generalization of on-line learning and an application to boosting,” vol. 139, pp. 23‒37, 1995.
• J. H. Friedman and P. Hall, “On bagging and nonlinear estimation,” J. Stat. Plan. Inference, vol. 137, no. 3, pp. 669‒683, 2007.
• J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: a Statistical View of Boosting,” Int. J. Qual. Methods, vol. 16, no. 1, pp. 1‒71, 2000.
• G. Fumera and F. Roli, “A theoretical and experimental analysis of linear combiners for multiple classifier systems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6, pp. 942‒956, 2005.
• P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Mach. Learn., vol. 63, no. 1, pp. 3‒42, 2006.
• G. Giacinto, F. Roli, and G. Fumera, “Design of effective multiple classifier systems by clustering ofnclassifiers,” Proc. 15th Int. Conf. Pattern Recognition. ICPR-2000, vol. 2, 2000.
• L. K. Hansen and P. Salamon, “Neural Network Ensembles,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. October, pp. 993‒1001, 1990.
• T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision Combination in Multiple Classifier Systems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 1, pp. 66‒75, 1994.
• Y. S. Huang and C. Y. Suen, “A method of combining multiple experts for the recognition ofnunconstrained handwritten numerals,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 1, pp. 90‒94, 1995.
• M. Kearns and L. Valiant, “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata,” J. ACM, vol. 41, no. 1, pp. 67‒95, 1994.
• J. Kittler, M. Hater, and R. P. W. Duin, “Combining classifiers,” Proc. - Int. Conf. Pattern Recognit., vol. 2, no. 3, pp. 897‒901, 1996.
• L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin, “Decision templates for multiple classifier fusion: an experimental comparison,” Pattern Recognit., vol. 34, no. 2, pp. 299‒314, 2001.
• L. I. Kuncheva and J. J. Rodríguez, “A weighted voting framework for classifiers ensembles,” Knowl. Inf. Syst., vol. 38, no. 2, pp. 259‒275, 2014.
• A. Lazarevic and Z. Obradovic, “Effective pruning of neural network classifier ensembles,” IJCNNʼ01. Int. Jt. Conf. Neural Networks. Proc. (Cat. No.01CH37222), vol. 2, no. January, pp. 796‒801, 2001.
• N. Li and Z. H. Zhou, “Selective ensemble under regularization framework,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5519 LNCS, pp. 293‒303, 2009.
52
53.
References
• H. Linand L. Li, “Support Vector Machinery for Infinite Ensemble Learning,” J. Mach. Learn. Res., vol. 9, pp. 285‒312, 2008.
• F. T. Liu, K. M. Ting, Y. Yu, and Z. H. Zhou, “Spectrum of Variable-Random Trees,” J. Artif. Intell. Res., vol. 32, pp. 355‒384, 2008.
• D. D. Margineantu and T. G. Dietterich, “Pruning Adaptive Boosting,” Proc. Fourteenth Int. Conf. Mach. Learn., pp. 211--218, 1997.
• G. Martínez-Muñoz and A. Suárez, “Using boosting to prune bagging ensembles,” Pattern Recognit. Lett., vol. 28, no. 1, pp. 156‒165, 2007.
• G. Martínez-Muñoz and A. Suárez, “Pruning in Ordered Bagging Ensembles,” Proc. 23rd Int. Conf. Mach. Learn., pp. 609‒616, 2006.
• G. Martínez-Muñoz and A. Suárez, “Aggregation ordering in bagging,” Proc. {IASTED} Int. Conf. Artif. Intell. Appl., pp. 258‒263, 2004.
• I. Mukherjee and R. E. Schapire, “A theory of multiclass boosting,” J. Mach. Learn. Res., vol. 14, no. 1, pp. 437‒497, 2011.
• A. Narasimhamurthy, “Theoretical bounds of majority voting performance for a binary classification problem,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 12, pp. 1988‒1995, 2005.
• D. W. Opitz and R. Maclin, “Popular Ensemble Methods: An Empirical Study,” J. Artif. Intell. Res., vol. 11, pp. 169‒198, 1999.
• I. Partalas, G. Tsoumakas, and I. Vlahavas, “Pruning an ensemble of classifiers via reinforcement learning,” Neurocomputing, vol. 72, no. 7‒9, pp. 1900‒1909, 2009.
• R. Polikar, “Ensemble based systems in decision making,” Circuits Syst. Mag. IEEE, vol. 6, no. 3, pp. 21‒45, 2006.
• S. Raudys and F. Roli, “The behavior knowledge space fusion method: Analysis of generalization error and strategies for performance improvement,” Mult. Classif. Syst. 4th Int. Work. Lect. Notes Comput. Sci. Vol. 2709, pp. 55‒64, 2003.
• M. Robnik-Sikonja, “Improving random forests,” Mach. Learn. ECML 2004, p. 12, 2004.
• J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A New classifier ensemble method,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 10, pp. 1619‒1630, 2006.
• R. E. Schapire, “The Strength of Weak Learnability (Extended Abstract),” Mach. Learn., vol. 227, no. October, pp. 28‒33, 1990.
• R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Mach. Learn., vol. 37, no. 3, pp. 297‒336, 1999.
• A. K. Seewald, “How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness,” Icml, no. January 2002, pp. 554‒561, 2002.
• R. Tibshirani, “Bias, variance and prediction error for classification rules,” Tech. Report, Univ. Toronto, pp. 1‒17, 1996.
• K. Tumer and J. Ghosh, “Analysis of decision boundaries in linearly combined neural classifiers,” Pattern Recognit., vol. 29, no. 2, pp. 341‒348, 1996.
• J. W. Vaughan, “CS260 : Machine Learning Theory Lecture 13 : Weak vs . Strong Learning and the Adaboost Algorithm Weak vs . Strong Learning AdaBoost,” pp. 1‒6, 2011.
53
54.
References
• D. H.Wolpert and W. G. Macready, “No Free Lunch Theorems for
Optimisation,” IEEE Trans. Evol. Comput., vol. 1, no. 1, pp. 67‒82, 1997.
• D. H. Wolpert, “Stacked Generalization,” vol. 87545, no. 505, pp. 241‒259, 1992.
• D. H. Wolpert, “The Lack of A Priori Distinctions Between Learning
Algorithms,” Neural Comput., vol. 8, no. 7, pp. 1341‒1390, 1996.
• L. Xu, A. Krzyżak, and C. Y. Suen, “Methods of Combining Multiple Classifiers
and Their Applications to Handwriting Recognition,” IEEE Trans. Syst. Man
Cybern., vol. 22, no. 3, pp. 418‒435, 1992.
• Y. Zhang, S. Burer, and W. N. Street, “Ensemble pruning via semi-definite
programming,” J. Mach. Learn. Res., vol. 7, pp. 1315‒1338, 2006.
• Z.-H. Zhou, J. Wu, and W. Tang, “Ensembling neural networks: Many could be
better than all,” Artif. Intell., vol. 137, no. 1‒2, pp. 239‒263, May 2002.
• J. Zhu, A. Arbor, and T. Hastie, “Multi-class AdaBoost,” pp. 0‒20, 2006.
54