What is XGBoost?

What is XGBoost?
최진
choijin9561@gmail.com

Kaggle & XGBoost
2016년에 작성된 논문 <XGBoost: A
Scalable Tree Boosting System>
2015년 29개의 kaggle winner
solution 중 17개가 XGBoost 사용함

출처: https://towardsdatascience.com/understanding-adaboost-2f94f22d5bfe
Gradient Boosting(AdaBoost)의 기본 개념은,
여러 개의 weak classifier들을 결합하여 단일 strong classifier을 만드는 것
 하지만 느린 속도와 overfitting 문제가 발생함

What is XGBoost?
XGBoost는 ‘Gradient Boosting algorithm’의 주요 라이브러리 중 하나
Gradient Boosting의 느린 속도와 overfitting 문제 해결
XGBoost의 특징
• GBM보다는 빠른 속도
• CART(Classification And Regression Tree)를 기반으로 한다.
(즉, 분류화 회귀 둘 다 가능하다)
• 병렬 처리(Parallelization)를 사용하기 때문에 학습과 분류가 빠르다.
• 유연성이 좋다. 다양한 custom 최적화 옵션을 제공한다.
• 욕심쟁이 알고리즘 (Greedy-algorithm)을 사용하여 자동 가지치기가 가능하
다. (overfitting을 줄여줌)

Basic Concept of XGBoost
기존 의사결정나무(Decision Tree)의 주요 원리
여러 기준에 따라 단일 분류를 제대로 하였는지 확인

XGBoost의 기본 원리
tree 1, 2와 같이 단일이 아닌 다중 의사결정나무 이용하여 점수 계산
y(score) = a*tree1(x) + b*tree2(x) + error (단, a,b는 트리의 비중 / a>0, b>0)
Ex) 하얀색 앞치마를 한 여성 = -1 + 0.9 = -0.1로 구분이 모호함
이러한 경우에는 a,b값을 통해 트리 비중 나누기 b>a이면 tree2에 비중을 두고 계산

Math Formula of XGBoost
기본 Gradient Boosting의 방법대로, round가 지날수록(t) 모델의 에러를 줄여감
XGBoost에서는 위 목적함수(Obj)의 오메가를 이용하여 트리의 비중을 조절한다.
오메가는 리프 개수 (gamma) + 리프 스코어 (L2 norm of leaf weight)로 구성됨
이를 통해, 오메가는 모델(f_t)의 복잡도를 결정함을 알 수 있음

XGBoost Python Code
https://www.kaggle.com/jinameliachoi/xgboost

자료 출처
• XGBoost: A Scalable Tree Boosting System (Tianqi Chen & Carlos Guestrin /
University of Washington)
(http://dmlc.cs.washington.edu/data/pdf/XGBoostArxiv.pdf)
• XGBoost eXtreme Gradient Boosting github
(https://github.com/dmlc/xgboost)
• Understanding Gradient Boosting Machines
(https://towardsdatascience.com/understanding-gradient-boosting-machines-
9be756fe76ab)
• What is XGBOOST?
(https://www.kaggle.com/getting-started/145362)
• XGBoost 사용하기
(https://brunch.co.kr/@snobberys/137)

What is XGBoost?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

What is XGBoost?