2. XgBoost ?
● XgBoost is a powerful machine learning algorithm
● It is designed to optimize performance and computational speed
● Open Source
● Primary C++ implementation Created by Tianqi Chen
● Widely accepted algorithms in Kaggle Competition
4. Prerequisite ● Decision Tree ● Gradient Boosting
Root
Internal
Leaf Leaf
1. Nodes and Branches
- A Decision Tree consists of nodes and branches.
- Nodes represent decisions or questions based on features.
- Branches represent possible outcomes or decisions based on those questions.
2. Root Node
- The top-most node is called the root node.
3. Internal Nodes
- Nodes other than the root node are internal nodes.
- They represent intermediate decisions/questions.
4. Leaf Nodes
- Terminal nodes are called leaf nodes.
- They represent the final outcomes or predictions.
9. Prerequisite ● Decision Tree ● Gradient Boosting
Example :
Process will repeat and we will reach to
final Decision Tree
10. Prerequisite ● Decision Tree ● Gradient Boosting
Gradient boosting is one of the most powerful techniques for building predictive models, and it is called
a Generalization of AdaBoost. The main objective of Gradient Boost is to minimize the loss function by
adding weak learners using a gradient descent optimization algorithm.
Gradient Boost has three main components.
● Loss Function: The role of the loss function is to estimate how best is the model in making
predictions with the given data. This could vary depending on the type of the problem.
● Weak Learner: Weak learner is one that classifies the data so poorly when compared to
random guessing. The weak learners are mostly decision trees.
● Additive Model: It is an iterative and sequential process in adding the decision trees one
step at a time. A gradient descent procedure is used to minimize the loss when adding
trees. Here instead of changing the weak learner we add a new parametrized DT to predict
residual.
15. XgBoost Package in Python : Installation
Photo by Pexels
Installing through pip in cmd
Installing through pip in colab/notebook code cell
16. XgBoost Package in Python : Classifier model
Photo by Pexels
# Importing
from sklearn.model_selection import train_test_split
import xgboost as xgb
from xgboost import XGBClassifier
# Initialize the XGBoost classifier (or regressor)
xgb_model = XGBClassifier(objective='binary:logistic',
early_stopping_rounds=10,
eval_metric='aucpr',
missing=0)
17. Photo by Pexels
# Train the model
xgb_model.fit(X_train, y_train, verbose=True,
eval_set=[(X_test, y_test)])
# Prediction
y_pred = xgb_model.predict(X_test)
XgBoost Package in Python : Classifier model
18. XgBoost Package in Python : Regressor model
Photo by Pexels
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import mean_squared_error as mse
# Load the Boston House Price dataset
boston = fetch_california_housing()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target, name='target')
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
19. XgBoost Package in Python : Regressor model
Photo by Pexels
from xgboost import XGBRegressor
# Create an XGBoost regressor
xgb_reg = xgb.XGBRegressor(objective='reg:squarederror',
random_state=42)
# Fit the model on the training data
xgb_reg.fit(X_train, y_train)
# Make predictions on the test data
y_pred = xgb_reg.predict(X_test)
# Calculate the Mean Squared Error
print(f"Mean Squared Error: {mse(y_test, y_pred):.2f}")
20. XgBoost : Common Parameters
Photo by Pexels
❖ booster: Specifies the type of boosting model to use. It can
be one of the following:
➢ gbtree: Tree-based models (default).
➢ gblinear: Linear models.
➢ dart: Dropouts meet Multiple Additive Regression Trees.
❖ n_estimators: The number of boosting rounds (trees) to
train. Increasing this value can lead to better performance
but also longer training times.
❖ learning_rate: Controls the step size at each iteration.
❖ max_depth: The maximum depth of each tree.
❖ nthread : Number of parallel threads used to run XGBoost.
21. XgBoost : XGBClassifier Parameters
❖ objective: Specifies the learning task and corresponding
objective function. Common options include:
➢ 'binary:logistic': Binary classification.
➢ 'multi:softmax': Multiclass classification.
➢ 'multi:softprob': Multiclass classification with
probabilities.
❖ eval_metric: The evaluation metric used during training.
Common options include:
➢ 'logloss': Logarithmic loss for binary classification.
➢ 'mlogloss': Multiclass logarithmic loss.
➢ 'auc': Area under the ROC curve.
❖ num_class: Number of classes in the dataset (for multiclass
classification).
22. XgBoost : XGBRegressor Parameters
❖ objective: Specifies the learning task and corresponding
objective function. Common options include:
➢ 'reg:squarederror': Linear regression for mean squared
error (default).
➢ 'reg:squaredlogerror': Regression for mean squared log
error.
➢ 'reg:logistic': Regression with logistic loss (for binary
regression).
➢ 'count:poisson': Poisson regression for count data.
❖ eval_metric: The evaluation metric used during training. Common
options include:
➢ 'rmse': Root Mean Squared Error (default for regression).
➢ 'mae': Mean Absolute Error.
➢ 'logloss': Logarithmic loss (for Poisson regression)