Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
School of Computer Sicience and
1. Introduction
2. Boosted Tree
3. Tree Ensemble
4. Additive Training
5. Split Algorithm
School of Computer Sicience and
1 Introduction
• What Xgboost can do ?
School of Computer Sicience and
Binary
Classification
Multiclass
Classification
Reg...
2 Boosted Tree
• Variants:
• GBDT: gradient boosted decision tree
• GBRT: gradient boosted regression tree
• MART: Multipl...
2.1 CART
• CART: Classification and Regression Tree
• Classification
• Three Classes
• Two Variables
School of Computer Si...
2.1 CART
Prediction
• predicting price of 1993-model cars.
• standardized (zero mean,unit variance)
School of Computer Sic...
2.1 CART
• Information Gain
• Gain Ratio
• Gini Index
• Pruning: prevent overfitting
School of Computer Sicience and
Which...
2.2 CART
• Input: Age, gender, occupation
• Goal: Does the person like computer games
School of Computer Sicience and
3 Tree Ensemble
• What is Tree Ensemble ?
• Single Tree is not powerful enough
• Benifts of Tree Ensemble ?
• Very widely ...
3 Tree Ensemble
School of Computer Sicience and
Prediction of is sum of scores predicted by each of the tree
3 Tree Ensemble-Elements of Supervised Learning
• Linear model
School of Computer Sicience and
Optimizing training loss en...
3 Tree Ensemble
• Assuming we have k trees
School of Computer Sicience and
• Parameters
• Including structure of each tree...
3 Tree Ensemble
• How can we learn functions?
School of Computer Sicience and
The height
in each
segment
Splitting
positio...
3 Tree Ensemble
School of Computer Sicience and
Regularization
Number of splitting points
L2 norm of the leaf weights
Trai...
3 Tree Ensemble
• We define tree by a vector of scores in leafs, and a leaf index mapping
function that maps an instance t...
3 Tree Ensemble
• Objective:
• Definiation of Complexity
School of Computer Sicience and
4 Addictive Training (Boosting)
• We can not use methods such as SGD, to find f ( since thet are trees,
instead of just nu...
4 Addictive Training (Boosting)
• How do we decide which f to add ?
• The prediction at round t is
• Consider square loss
...
4 Addictive Training (Boosting)
• Taylor expansion of the objective
• Objective after expansion
School of Computer Sicienc...
4 Addictive Training (Boosting)
• Our new goal, with constants removed
• Benifits
School of Computer Sicience and
4 Addictive Training (Boosting)
• Define the instance set in leaf j as
• Regroup the objective by each leaf
• This is sum ...
4 Addictive Training (Boosting)
• Let us define
• Results
School of Computer Sicience and
There can be infinite possible t...
4 Addictive Training (Boosting)
• Greedy Learning , we grow the tree greedily
School of Computer Sicience and
5 Spliting algorithm
• Efficeint finding of the best split
• What is the gain of a split rule xj < a ? say xj is age
Schoo...
5 Spliting algorithm
School of Computer Sicience and
5 Spliting algorithm
School of Computer Sicience and
5 Spliting algorithm
School of Computer Sicience and
References
• http://www.52cs.org/?p=429
• http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf
• http://www.sigkdd.or...
Suplementary
• Tree model, works very well on tabular data, easy to use,
and interpret and control
• It can not extrapolat...
School of Computer Sicience and
Upcoming SlideShare
Loading in …5
×

Introduction to XGboost

1,180 views

Published on

Introduction to XGboost, for more information please, visit: http://shuaizhang.tech/

Published in: Internet
  • Be the first to comment

Introduction to XGboost

  1. 1. School of Computer Sicience and
  2. 2. 1. Introduction 2. Boosted Tree 3. Tree Ensemble 4. Additive Training 5. Split Algorithm School of Computer Sicience and
  3. 3. 1 Introduction • What Xgboost can do ? School of Computer Sicience and Binary Classification Multiclass Classification Regression Learning to Rank By 02. March.2017 Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library Support Language • Python • R • Java • Scala • C++ and more Support Platform • Runs on single machine, • Hadoop • Spark • Flink • DataFlow
  4. 4. 2 Boosted Tree • Variants: • GBDT: gradient boosted decision tree • GBRT: gradient boosted regression tree • MART: Multiple Additive Regression Trees • LambdaMART, for ranking task • ... School of Computer Sicience and
  5. 5. 2.1 CART • CART: Classification and Regression Tree • Classification • Three Classes • Two Variables School of Computer Sicience and
  6. 6. 2.1 CART Prediction • predicting price of 1993-model cars. • standardized (zero mean,unit variance) School of Computer Sicience andpartition
  7. 7. 2.1 CART • Information Gain • Gain Ratio • Gini Index • Pruning: prevent overfitting School of Computer Sicience and Which variable to use for division
  8. 8. 2.2 CART • Input: Age, gender, occupation • Goal: Does the person like computer games School of Computer Sicience and
  9. 9. 3 Tree Ensemble • What is Tree Ensemble ? • Single Tree is not powerful enough • Benifts of Tree Ensemble ? • Very widely used • Invariant to scaling of inputs • Learn higher order interaction between features • Scalable School of Computer Sicience and Boosted Tree Random Forest Tree Ensemble
  10. 10. 3 Tree Ensemble School of Computer Sicience and Prediction of is sum of scores predicted by each of the tree
  11. 11. 3 Tree Ensemble-Elements of Supervised Learning • Linear model School of Computer Sicience and Optimizing training loss encourages predictive models Opyimizing regularization encourages simple models
  12. 12. 3 Tree Ensemble • Assuming we have k trees School of Computer Sicience and • Parameters • Including structure of each tree, and the score in the leaf • Or simply use function as parameters • Instead learning weights in R^d, we are learning functions ( trees)
  13. 13. 3 Tree Ensemble • How can we learn functions? School of Computer Sicience and The height in each segment Splitting positions • Training loss: How will the function fit on the points? • Regularization: How do we define complexity of the function?
  14. 14. 3 Tree Ensemble School of Computer Sicience and Regularization Number of splitting points L2 norm of the leaf weights Training loss: error =
  15. 15. 3 Tree Ensemble • We define tree by a vector of scores in leafs, and a leaf index mapping function that maps an instance to a leaf School of Computer Sicience and
  16. 16. 3 Tree Ensemble • Objective: • Definiation of Complexity School of Computer Sicience and
  17. 17. 4 Addictive Training (Boosting) • We can not use methods such as SGD, to find f ( since thet are trees, instead of just numerical vectors) • Start from constant prediction, add a new function each time. School of Computer Sicience and
  18. 18. 4 Addictive Training (Boosting) • How do we decide which f to add ? • The prediction at round t is • Consider square loss School of Computer Sicience and
  19. 19. 4 Addictive Training (Boosting) • Taylor expansion of the objective • Objective after expansion School of Computer Sicience and
  20. 20. 4 Addictive Training (Boosting) • Our new goal, with constants removed • Benifits School of Computer Sicience and
  21. 21. 4 Addictive Training (Boosting) • Define the instance set in leaf j as • Regroup the objective by each leaf • This is sum of T independent quadratic functions • Two facts about single variable quadratic function School of Computer Sicience and
  22. 22. 4 Addictive Training (Boosting) • Let us define • Results School of Computer Sicience and There can be infinite possible tree structures
  23. 23. 4 Addictive Training (Boosting) • Greedy Learning , we grow the tree greedily School of Computer Sicience and
  24. 24. 5 Spliting algorithm • Efficeint finding of the best split • What is the gain of a split rule xj < a ? say xj is age School of Computer Sicience and All we need is sume of g and h in each side, and calculate • Left to right linear scan over sorted instance is enough to decide the best split
  25. 25. 5 Spliting algorithm School of Computer Sicience and
  26. 26. 5 Spliting algorithm School of Computer Sicience and
  27. 27. 5 Spliting algorithm School of Computer Sicience and
  28. 28. References • http://www.52cs.org/?p=429 • http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf • http://www.sigkdd.org/node/362 • http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf • http://www.stat.wisc.edu/~loh/treeprogs/guide/wires11.pdf • https://github.com/dmlc/xgboost/blob/master/demo/README.md • http://datascience.la/xgboost-workshop-and-meetup-talk-with-tianqi-chen/ • http://xgboost.readthedocs.io/en/latest/model.html • http://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine- learning/ School of Computer Sicience and
  29. 29. Suplementary • Tree model, works very well on tabular data, easy to use, and interpret and control • It can not extrapolate • Deep Forest: Towards An Alternative to Deep Neural Networks, Zhi-Hua Zhou, Ji Feng, Nanjing University • Submitted on 28 Feb 2017 • Comparable performance and easy to train (less parameters) School of Computer Sicience and
  30. 30. School of Computer Sicience and

×