15. First, the results confirm the experiments in (Caruana & Niculescu-Mizil, 2006) where
boosted decision trees perform exceptionally well when dimensionality is low. In this study
boosted trees are the method of choice for up to about 4000 dimensions. Above that, random
forests have the best overall performance. (Random forests were the 2nd best performing
method in the previous study.) We suspect that the reason for this is that boosting trees is
prone to overfitting and this be- comes a serious problem in high dimensions. Random
forests is better behaved in very high dimensions, it is easy to parallelize, scales
efficiently to high dimensions and performs consistently well on all three metrics.
16. The bootstrap analysis suggests that random forests probably are the top
performing method overall, with a 73% chance of ranking first, a 20%
chance of ranking second, and less than a 8% chance of ranking below 2nd
place. The ranks of other good methods, however, are less clear and there
appears to be a three-way tie for 2nd place for boosted trees, ANNs, and SVMs.
23. http://blog.kaggle.com/2015/11/09/profiling-top-kagglers-gilberto-titericz-new-1-in-the-world/
Gradient Boosting Machines are the best!
Before I knew of GBM, I was a big fan of
neural networks.
と聞いてみた It depends on the problem, but if I have to
pick one, then it is GBM (its XGBoost flavor).
http://blog.kaggle.com/2015/06/22/profiling-top-kagglers-owen-zhang-currently-1-in-the-world/
I like Gradient Boosting and Tree methods in
general.
http://blog.kaggle.com/2016/02/10/profiling-top-kagglers-kazanova-new-1-in-the-world/
http://blog.kaggle.com/2016/02/22/profiling-top-kagglers-leustagos-current-7-highest-1/
Gradient boosted trees by far!