Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PyData Hamburg - Gradient Boosting

28 views

Published on

Gradient Boosted Decision Trees are a price winning machine learning model that can be used for classification, prediction and ranking tasks. This talk will give a theoretical introduction to Decision Trees and Gradient Boosting and give examples from practical applications.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

PyData Hamburg - Gradient Boosting

  1. 1. A Deep Dive Into Gradient Boosting Daniel Kohlsdorf
  2. 2. 2 Why Gradient Boosting?
  3. 3. 3 1. Outlier Filtering For Job Recommendations 2. Reranking Job Recommendations 3. Classifying Profiles: 1. Willingness To Change Job 2. Discipline Since Then …
  4. 4. 4 Decision Trees, Gradient Boosting and Application(s)
  5. 5. 5 Decision Trees
  6. 6. 6 Decision Trees
  7. 7. 7 Goal 1. Partition input space 2. Pure class distribution in each partition
  8. 8. 8 Decision Trees: Guillotine cuts
  9. 9. 9 Decision Trees: Guillotine cuts
  10. 10. 10 Decision Trees: Guillotine cuts
  11. 11. 11 Finding The Best Split age 0 12,5 25 37,5 50 Spongebob CrimeShow crime show fans sponge bob fans
  12. 12. Finding The Best Split crime show fans sponge bob fans age 0 25 50 75 100 Spongebob CrimeShow 0 17,5 35 52,5 70 Spongebob CrimeShow
  13. 13. Finding The Best Split crime show fans sponge bob fans age 0 25 50 75 100 Spongebob CrimeShow 0 25 50 75 100 Spongebob CrimeShow
  14. 14. 14 • Choose the best split based on class distribution impurity • Common measure is entropy. • Choose the split that minimizes impurity the most: Finding the best split 0 12,5 25 37,5 50 Spongebob CrimeShow 0 25 50 75 100 Spongebob 0 25 50 75 100 CrimeShow
  15. 15. 15 Greedily Constructing A Decision Tree
  16. 16. 16 Greedily Constructing A Decision Tree
  17. 17. 17 Greedily Constructing A Decision Tree
  18. 18. 18 Greedily Constructing A Decision Tree
  19. 19. 19 Gradient Boosting One Tree Is Not Enough
  20. 20. 20 1. Weighted combination of weak learners 2. Prediction is based on comitee votes 3. Boosting: 1. Train ensemble one weak learner at the time 2. Focus new learners on wrongly predicted examples Ensemble Methods
  21. 21. 21 1. Learn a regressor 2. Compute the error residual (Gradient in deep learning) 3. Then build a new model to predict that residual Gradient Boosting
  22. 22. 22 Gradient Boosting Our model: For each datapoint return 0
  23. 23. 23 Gradient Boosting
  24. 24. 24 Gradient Boosting
  25. 25. 25 Building a Decision Tree from Gradients crime show fans sponge bob fans age We have a gradient for each training example Return pooled gradients instead of class
  26. 26. 26 Building a Decision Tree from Gradients crime show fans sponge bob fans age Impurity is the mangnitude of the pooled gradient
  27. 27. 27 1. RMSE [Prediction] 2. Sigmoid [Binary Classification] 3. Softmax [Multiclass Classification] 4. Ranking Loss [Ranking] Not just regression
  28. 28. 28 Gradient Boosting @Xing
  29. 29. 29 Outlier Filtering
  30. 30. 30
  31. 31. 31
  32. 32. 32 Thanks daniel.kohlsdorf@xing.com http://daniel-at-world.blogspot.com

×