Histogram-based algorithm
● Histogram-basedalgorithmにおいて、カテゴリデータは以下のように扱っている.
(https://github.com/Microsoft/LightGBM/issues/1279)
“So when #category is smaller than max_bin, the #bin is smaller than max_bin.
otherwise it use the most frequent categories and stop when use 99% data.”
Reference
1. Ke, Guolin,et al. "Lightgbm: A highly efficient gradient boosting decision tree." Advances in Neural Information
Processing Systems. 2017.
2. Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm sigkdd
international conference on knowledge discovery and data mining. ACM, 2016.
3. Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. No. 10. New York,
NY, USA:: Springer series in statistics, 2001.
4. Friedman, Jerome H. "Greedy function approximation: a gradient boosting machine." Annals of statistics (2001):
1189-1232.