SlideShare a Scribd company logo
1 of 39
Download to read offline
DataCamp Extreme Gradient Boosting with XGBoost
Welcome to the
course!
EXTREME GRADIENT BOOSTING WITH XGBOOST
Sergey Fogelson
VP of Analytics, Viacom
DataCamp Extreme Gradient Boosting with XGBoost
Before we get to XGBoost...
Need to understand the basics of
Supervised classification
Decision trees
Boosting
DataCamp Extreme Gradient Boosting with XGBoost
Supervised learning
Relies on labeled data
Have some understanding of past behavior
DataCamp Extreme Gradient Boosting with XGBoost
Supervised learning example
Does a specific image contain a person's face?
Training data: vectors of pixel values
Labels: 1 or 0
DataCamp Extreme Gradient Boosting with XGBoost
Supervised learning: Classification
Outcome can be binary or multi-class
DataCamp Extreme Gradient Boosting with XGBoost
Binary classification example
Will a person purchase the insurance package given some quote?
DataCamp Extreme Gradient Boosting with XGBoost
Multi-class classification example
Classifying the species of a given bird
DataCamp Extreme Gradient Boosting with XGBoost
AUC: Metric for binary classification models
DataCamp Extreme Gradient Boosting with XGBoost
Accuracy score and confusion matrix
DataCamp Extreme Gradient Boosting with XGBoost
Supervised learning with scikit-learn
DataCamp Extreme Gradient Boosting with XGBoost
Other supervised learning considerations
Features can be either numeric or categorical
Numeric features should be scaled (Z-scored)
Categorical features should be encoded (one-hot)
DataCamp Extreme Gradient Boosting with XGBoost
Ranking
Predicting an ordering on a set of choices
DataCamp Extreme Gradient Boosting with XGBoost
Recommendation
Recommending an item to a user
Based on consumption history and profile
Example: Netflix
DataCamp Extreme Gradient Boosting with XGBoost
Let's get to work!
EXTREME GRADIENT BOOSTING WITH XGBOOST
DataCamp Extreme Gradient Boosting with XGBoost
Introducing XGBoost
EXTREME GRADIENT BOOSTING WITH XGBOOST
Sergey Fogelson
VP of Analytics, Viacom
DataCamp Extreme Gradient Boosting with XGBoost
What is XGBoost?
Optimized gradient-boosting machine learning library
Originally written in C++
Has APIs in several languages:
Python
R
Scala
Julia
Java
DataCamp Extreme Gradient Boosting with XGBoost
What makes XGBoost so popular?
Speed and performance
Core algorithm is parallelizable
Consistently outperforms single-algorithm methods
State-of-the-art performance in many ML tasks
DataCamp Extreme Gradient Boosting with XGBoost
Using XGBoost: A Quick Example
In [1]: import xgboost as xgb
In [2]: import pandas as pd
In [3]: import numpy as np
In [4]: from sklearn.model_selection import train_test_split
In [5]: class_data = pd.read_csv("classification_data.csv")
In [6]: X, y = class_data.iloc[:,:-1], class_data.iloc[:,-1]
In [7]: X_train, X_test, y_train, y_test= train_test_split(X, y,
test_size=0.2, random_state=123)
In [8]: xg_cl = xgb.XGBClassifier(objective='binary:logistic',
n_estimators=10, seed=123)
In [9]: xg_cl.fit(X_train, y_train)
In [10]: preds = xg_cl.predict(X_test)
In [11]: accuracy = float(np.sum(preds==y_test))/y_test.shape[0]
In [12]: print("accuracy: %f" % (accuracy))
DataCamp Extreme Gradient Boosting with XGBoost
Let's begin using
XGBoost!
EXTREME GRADIENT BOOSTING WITH XGBOOST
DataCamp Extreme Gradient Boosting with XGBoost
What is a decision
tree?
EXTREME GRADIENT BOOSTING WITH XGBOOST
Sergey Fogelson
VP of Analytics, Viacom
DataCamp Extreme Gradient Boosting with XGBoost
Visualizing a decision tree
[1] https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.modeler.help/nodes_treebuilding.htm
DataCamp Extreme Gradient Boosting with XGBoost
Decision trees as base learners
Base learner - Individual learning algorithm in an ensemble algorithm
Composed of a series of binary questions
Predictions happen at the "leaves" of the tree
DataCamp Extreme Gradient Boosting with XGBoost
Decision trees and CART
Constructed iteratively (one decision at a time)
Until a stopping criterion is met
DataCamp Extreme Gradient Boosting with XGBoost
Individual decision trees tend to overfit
[1] http://scott.fortmann-roe.com/docs/BiasVariance.html
DataCamp Extreme Gradient Boosting with XGBoost
Individual decision trees tend to overfit
[1] http://scott.fortmann-roe.com/docs/BiasVariance.html
DataCamp Extreme Gradient Boosting with XGBoost
CART: Classification and Regression Trees
Each leaf always contains a real-valued score
Can later be converted into categories
DataCamp Extreme Gradient Boosting with XGBoost
Let's work with some
decision trees!
EXTREME GRADIENT BOOSTING WITH XGBOOST
DataCamp Extreme Gradient Boosting with XGBoost
What is Boosting?
EXTREME GRADIENT BOOSTING WITH XGBOOST
Sergey Fogelson
VP of Analytics, Viacom
DataCamp Extreme Gradient Boosting with XGBoost
Boosting overview
Not a specific machine learning algorithm
Concept that can be applied to a set of machine learning models
"Meta-algorithm"
Ensemble meta-algorithm used to convert many weak learners into a
strong learner
DataCamp Extreme Gradient Boosting with XGBoost
Weak learners and strong learners
Weak learner: ML algorithm that is slightly better than chance
Example: Decision tree whose predictions are slightly better than
50%
Boosting converts a collection of weak learners into a strong learner
Strong learner: Any algorithm that can be tuned to achieve good
performance
DataCamp Extreme Gradient Boosting with XGBoost
How boosting is accomplished
Iteratively learning a set of weak models on subsets of the data
Weighing each weak prediction according to each weak learner's
performance
Combine the weighted predictions to obtain a single weighted
prediction
... that is much better than the individual predictions themselves!
DataCamp Extreme Gradient Boosting with XGBoost
Boosting example
[1] https://xgboost.readthedocs.io/en/latest/model.html
DataCamp Extreme Gradient Boosting with XGBoost
Model evaluation through cross-validation
Cross-validation: Robust method for estimating the performance of a
model on unseen data
Generates many non-overlapping train/test splits on training data
Reports the average test set performance across all data splits
DataCamp Extreme Gradient Boosting with XGBoost
Cross-validation in XGBoost example
In [1]: import xgboost as xgb
In [2]: import pandas as pd
In [3]: class_data = pd.read_csv("classification_data.csv")
In [4]: churn_dmatrix = xgb.DMatrix(data=churn_data.iloc[:,:-1],
label=churn_data.month_5_still_here)
In [5]: params={"objective":"binary:logistic","max_depth":4}
In [6]: cv_results = xgb.cv(dtrain=churn_dmatrix, params=params, nfold=4,
num_boost_round=10, metrics="error", as_pandas=True)
In [7]: print("Accuracy: %f" %((1-cv_results["test-error-mean"]).iloc[-1]))
Accuracy: 0.88315
DataCamp Extreme Gradient Boosting with XGBoost
Let's practice!
EXTREME GRADIENT BOOSTING WITH XGBOOST
DataCamp Extreme Gradient Boosting with XGBoost
When should I use
XGBoost?
EXTREME GRADIENT BOOSTING WITH XGBOOST
Sergey Fogelson
VP of Analytics, Viacom
DataCamp Extreme Gradient Boosting with XGBoost
When to use XGBoost
You have a large number of training samples
Greater than 1000 training samples and less 100 features
The number of features < number of training samples
You have a mixture of categorical and numeric features
Or just numeric features
DataCamp Extreme Gradient Boosting with XGBoost
When to NOT use XGBoost
Image recognition
Computer vision
Natural language processing and understanding problems
When the number of training samples is significantly smaller than the
number of features
DataCamp Extreme Gradient Boosting with XGBoost
Let's practice!
EXTREME GRADIENT BOOSTING WITH XGBOOST

More Related Content

Similar to chapter1.pdf

Run XGBoost with Amazon SageMaker (AIM335) - AWS re:Invent 2018
Run XGBoost with Amazon SageMaker (AIM335) - AWS re:Invent 2018Run XGBoost with Amazon SageMaker (AIM335) - AWS re:Invent 2018
Run XGBoost with Amazon SageMaker (AIM335) - AWS re:Invent 2018Amazon Web Services
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarSigOpt
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Julien SIMON
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Codiax
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuningYosuke Mizutani
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerAmazon Web Services
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsDr Sulaimon Afolabi
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarinn5712036
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleAmazon Web Services
 
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...Amazon Web Services
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarSigOpt
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 

Similar to chapter1.pdf (20)

Run XGBoost with Amazon SageMaker (AIM335) - AWS re:Invent 2018
Run XGBoost with Amazon SageMaker (AIM335) - AWS re:Invent 2018Run XGBoost with Amazon SageMaker (AIM335) - AWS re:Invent 2018
Run XGBoost with Amazon SageMaker (AIM335) - AWS re:Invent 2018
 
Erdi güngör bbs
Erdi güngör bbsErdi güngör bbs
Erdi güngör bbs
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques Webinar
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
 
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 

chapter1.pdf

  • 1. DataCamp Extreme Gradient Boosting with XGBoost Welcome to the course! EXTREME GRADIENT BOOSTING WITH XGBOOST Sergey Fogelson VP of Analytics, Viacom
  • 2. DataCamp Extreme Gradient Boosting with XGBoost Before we get to XGBoost... Need to understand the basics of Supervised classification Decision trees Boosting
  • 3. DataCamp Extreme Gradient Boosting with XGBoost Supervised learning Relies on labeled data Have some understanding of past behavior
  • 4. DataCamp Extreme Gradient Boosting with XGBoost Supervised learning example Does a specific image contain a person's face? Training data: vectors of pixel values Labels: 1 or 0
  • 5. DataCamp Extreme Gradient Boosting with XGBoost Supervised learning: Classification Outcome can be binary or multi-class
  • 6. DataCamp Extreme Gradient Boosting with XGBoost Binary classification example Will a person purchase the insurance package given some quote?
  • 7. DataCamp Extreme Gradient Boosting with XGBoost Multi-class classification example Classifying the species of a given bird
  • 8. DataCamp Extreme Gradient Boosting with XGBoost AUC: Metric for binary classification models
  • 9. DataCamp Extreme Gradient Boosting with XGBoost Accuracy score and confusion matrix
  • 10. DataCamp Extreme Gradient Boosting with XGBoost Supervised learning with scikit-learn
  • 11. DataCamp Extreme Gradient Boosting with XGBoost Other supervised learning considerations Features can be either numeric or categorical Numeric features should be scaled (Z-scored) Categorical features should be encoded (one-hot)
  • 12. DataCamp Extreme Gradient Boosting with XGBoost Ranking Predicting an ordering on a set of choices
  • 13. DataCamp Extreme Gradient Boosting with XGBoost Recommendation Recommending an item to a user Based on consumption history and profile Example: Netflix
  • 14. DataCamp Extreme Gradient Boosting with XGBoost Let's get to work! EXTREME GRADIENT BOOSTING WITH XGBOOST
  • 15. DataCamp Extreme Gradient Boosting with XGBoost Introducing XGBoost EXTREME GRADIENT BOOSTING WITH XGBOOST Sergey Fogelson VP of Analytics, Viacom
  • 16. DataCamp Extreme Gradient Boosting with XGBoost What is XGBoost? Optimized gradient-boosting machine learning library Originally written in C++ Has APIs in several languages: Python R Scala Julia Java
  • 17. DataCamp Extreme Gradient Boosting with XGBoost What makes XGBoost so popular? Speed and performance Core algorithm is parallelizable Consistently outperforms single-algorithm methods State-of-the-art performance in many ML tasks
  • 18. DataCamp Extreme Gradient Boosting with XGBoost Using XGBoost: A Quick Example In [1]: import xgboost as xgb In [2]: import pandas as pd In [3]: import numpy as np In [4]: from sklearn.model_selection import train_test_split In [5]: class_data = pd.read_csv("classification_data.csv") In [6]: X, y = class_data.iloc[:,:-1], class_data.iloc[:,-1] In [7]: X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, random_state=123) In [8]: xg_cl = xgb.XGBClassifier(objective='binary:logistic', n_estimators=10, seed=123) In [9]: xg_cl.fit(X_train, y_train) In [10]: preds = xg_cl.predict(X_test) In [11]: accuracy = float(np.sum(preds==y_test))/y_test.shape[0] In [12]: print("accuracy: %f" % (accuracy))
  • 19. DataCamp Extreme Gradient Boosting with XGBoost Let's begin using XGBoost! EXTREME GRADIENT BOOSTING WITH XGBOOST
  • 20. DataCamp Extreme Gradient Boosting with XGBoost What is a decision tree? EXTREME GRADIENT BOOSTING WITH XGBOOST Sergey Fogelson VP of Analytics, Viacom
  • 21. DataCamp Extreme Gradient Boosting with XGBoost Visualizing a decision tree [1] https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.modeler.help/nodes_treebuilding.htm
  • 22. DataCamp Extreme Gradient Boosting with XGBoost Decision trees as base learners Base learner - Individual learning algorithm in an ensemble algorithm Composed of a series of binary questions Predictions happen at the "leaves" of the tree
  • 23. DataCamp Extreme Gradient Boosting with XGBoost Decision trees and CART Constructed iteratively (one decision at a time) Until a stopping criterion is met
  • 24. DataCamp Extreme Gradient Boosting with XGBoost Individual decision trees tend to overfit [1] http://scott.fortmann-roe.com/docs/BiasVariance.html
  • 25. DataCamp Extreme Gradient Boosting with XGBoost Individual decision trees tend to overfit [1] http://scott.fortmann-roe.com/docs/BiasVariance.html
  • 26. DataCamp Extreme Gradient Boosting with XGBoost CART: Classification and Regression Trees Each leaf always contains a real-valued score Can later be converted into categories
  • 27. DataCamp Extreme Gradient Boosting with XGBoost Let's work with some decision trees! EXTREME GRADIENT BOOSTING WITH XGBOOST
  • 28. DataCamp Extreme Gradient Boosting with XGBoost What is Boosting? EXTREME GRADIENT BOOSTING WITH XGBOOST Sergey Fogelson VP of Analytics, Viacom
  • 29. DataCamp Extreme Gradient Boosting with XGBoost Boosting overview Not a specific machine learning algorithm Concept that can be applied to a set of machine learning models "Meta-algorithm" Ensemble meta-algorithm used to convert many weak learners into a strong learner
  • 30. DataCamp Extreme Gradient Boosting with XGBoost Weak learners and strong learners Weak learner: ML algorithm that is slightly better than chance Example: Decision tree whose predictions are slightly better than 50% Boosting converts a collection of weak learners into a strong learner Strong learner: Any algorithm that can be tuned to achieve good performance
  • 31. DataCamp Extreme Gradient Boosting with XGBoost How boosting is accomplished Iteratively learning a set of weak models on subsets of the data Weighing each weak prediction according to each weak learner's performance Combine the weighted predictions to obtain a single weighted prediction ... that is much better than the individual predictions themselves!
  • 32. DataCamp Extreme Gradient Boosting with XGBoost Boosting example [1] https://xgboost.readthedocs.io/en/latest/model.html
  • 33. DataCamp Extreme Gradient Boosting with XGBoost Model evaluation through cross-validation Cross-validation: Robust method for estimating the performance of a model on unseen data Generates many non-overlapping train/test splits on training data Reports the average test set performance across all data splits
  • 34. DataCamp Extreme Gradient Boosting with XGBoost Cross-validation in XGBoost example In [1]: import xgboost as xgb In [2]: import pandas as pd In [3]: class_data = pd.read_csv("classification_data.csv") In [4]: churn_dmatrix = xgb.DMatrix(data=churn_data.iloc[:,:-1], label=churn_data.month_5_still_here) In [5]: params={"objective":"binary:logistic","max_depth":4} In [6]: cv_results = xgb.cv(dtrain=churn_dmatrix, params=params, nfold=4, num_boost_round=10, metrics="error", as_pandas=True) In [7]: print("Accuracy: %f" %((1-cv_results["test-error-mean"]).iloc[-1])) Accuracy: 0.88315
  • 35. DataCamp Extreme Gradient Boosting with XGBoost Let's practice! EXTREME GRADIENT BOOSTING WITH XGBOOST
  • 36. DataCamp Extreme Gradient Boosting with XGBoost When should I use XGBoost? EXTREME GRADIENT BOOSTING WITH XGBOOST Sergey Fogelson VP of Analytics, Viacom
  • 37. DataCamp Extreme Gradient Boosting with XGBoost When to use XGBoost You have a large number of training samples Greater than 1000 training samples and less 100 features The number of features < number of training samples You have a mixture of categorical and numeric features Or just numeric features
  • 38. DataCamp Extreme Gradient Boosting with XGBoost When to NOT use XGBoost Image recognition Computer vision Natural language processing and understanding problems When the number of training samples is significantly smaller than the number of features
  • 39. DataCamp Extreme Gradient Boosting with XGBoost Let's practice! EXTREME GRADIENT BOOSTING WITH XGBOOST