Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Understanding GBM and XGBoost in Scikit-Learn
1. 1
Ensemble
• Ensemble method is creating many learners(classifier or
regressor) and learning new hypothesis by combining them
• Combining many learners can lead to more reliable prediction
results than just one learner
(Most of ensemble method mainly use many learners with the same algorithm)
2. Bagging and Boosting
2
• Bagging assigns bootstrap sampled train data to various classifiers and predicts the results by voting or averaging
the prediction results
RandomForest
• Boosting assigns many weak learners and each learner trains data and repeatedly correct errors by updating
weights
Adaboost
Gradient Boost
eXtream Gradient Boost ( Xgboost)
• Generally speaking, Boosting is better in predition performance but it takes too much of time and has a little more
chance of overfitting
4. 4
0.3 0.5 0.8
+
+
+
+ +
-
-
-
-
-
AdaBoost
Each weak learner is combined by updating
weights. For example, first learner gets weight 0.3,
second one gets 0.5 and third one gets 0.8 and
consequently they are put together and do
prediction
5. 5
GBM(Gradient Boosting Machine)
GBM is similar to Adaboost.
Major difference is the way of updating weights.
It is more sophisticatedly done by gradient descent.
But it takes a lot of time by serially updating
each weak learner’s weight.
6. Advantages of XGBOOST
6
eXtra Gradient Boost
XGBoost
eXcellent prediction performance
Faster execution time compared with GBM
• CPU Parallel processing enabled
Various enhancements
• Regularization
• Tree Pruning
Various utilities
• Early Stopping
• Embeded cross validation
• Embeded Null value processing
7. XGBOOST implementation in python
7
C/C++ Native Module Python Wrapper Scikit-Learn Wrapper
Initially XGBoost was made
by C/C++
Python package was made
by call native C/C++
It has its own python API
and library
Integrated with scikit-learn framework
• Training and prediction by fit( ) and
predict( ) methods like any other
classifiers in scikit-learn
• Having no problem of using other scikit-
learn modules like GridSearchCV by
seamless integration
8. XGBOOST Python Wrapper vs XGBOOST Scikit-learn Wrapper
8
Category Python Wrapper Scikit-learn Wrapper
modules from xgboost as xgb from xgboost import XGBClassifier
Training and test
datasets
DMatrix class is needed
train = xgb.DMatrix(data=X_train , label=y_train)
in order by create DMatrix objects, feature datasets and
label datasets are provided as parameters
Using numpy or pandas directly
training API
Xgb_model=xgb.train( )
Xgb_model is retured trained model by calling xgb.train()
XGBClassifer.fit( )
prediction API
Xgb_model.predict( )
predict( ) method is called in trained object by
xgb.train() . Returned value is not direct prediction
result. It is probability value for prediction result
XGBClassifer.predict( )
Returning direct prediction results
Feature importance
visualization
plot_importance( ) plot_importance( )
9. Hyper parameters of python wrapper and scikit-learn wrapper
9
Python Wrapper Scikit-Learn Wrapper Hyper parameter description
eta learning_rate Same parameter with GBM’s learning_rate. It is learning rate of updating weights iterating
boosting steps. Normally it is set between 0 and 1
Default value is 0.3 when using python wrapper xgboos, 0.1 when using scikit-learn wrapper
xgboost
num_boost_rounds n_estimators Same parameter with n_estimators in scikit learn ensemble. It means numbers of weak
learners(iteration count)
min_child_weight min_child_weight Similart to min_child_leaf of decision tree. It is used against overfitting
max_depth max_depth Same with max_depth in decision tree. Max tree depth
sub_sample subsample Same parameter with subsample in GBM. It sets samping percentage in order to prevent tree
from growing bigger and overfitting. If you set sub_sample=0.5, a half of total data can be
used for creating tree. 0 ~ 1 can be used but normally 0.5 ~ 1 is used
10. 10
Python Wrapper Scikit-Learn Wrapper Hyper parameter description
lambda reg_lambda L2 Regularization value. Default is 1. The bigger value, the more regualarization. Used for
overfitting
alpha reg_alpha L1 Regularization value. Default is 0. The bigger value, the more regualarization. Used for
overfitting
colsample_bytree colsample_bytree Similar to max_features in GBM. It is used for sampling features for making tree. If there are
too many features , it is used for overfitting
XGBoost 파이썬 Wrapper 하이퍼 파라미터와 사이킷런 Wrapper
하이퍼 파라미터 비교
11. 11
XGBoost Early stopping
XGBoost can stop its iterations before it reaches designated count unless cost is reduced
during specified early stopping repetition interval.
It can be wisely used in hyper parameter tuning process to reduce the tuning time.
If you set too small value to early stopping, training can be finished without proper optimization
Main parameters for early stopping
• early_stopping_rounds : Maximum iterations at which loss metric is no more enhanced
• eval_metric : cost evaluation metric.
• eval_set : evaluation dataset which is used in evaluating cost reduction.
12. 12
XGBoost Wrap up
• XGBoost (and LightGBM) is most used ensemble method especially among Kagglers
• It can enhance prediction performance compared with GBM but not as much as rocket-boosted improvement
• Execution time is faster compared with GBM and it can be used with parallel processing with multi cpu cores
• Hyper parameter tuning is difficult because of too much of them. But you don’t have to stick to it as drastic improvement
of performance is rare case in XGBoost
• XGBoost is not golden campass but it is widely used in various appllications especially in classification and regression