Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Adding Tree and Tree

955 views

Published on

Brushfire: Distributed Decision Tree Ensemble Learning in Scala

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Adding Tree and Tree

  1. 1. Adding tree and tree @avibryant
  2. 2. Brushfire:! Distributed, Generic, Decision Tree Learning in Scala (using Hadoop) @avibryant Open source: Real Soon Now
  3. 3. Vun!
  4. 4. Two! +
  5. 5. Tree!
  6. 6. Do you like cookies? {height: 5, color: blue, wears: fur} ? {height: 7, color: yellow, wears: feathers} ? {height: 3, color: green, wears: garbage} ? {height: 5, color: yellow, wears: stripes} ? {height: 4, color: orange, wears: stripes} ?
  7. 7. Do you like cookies? color != blue color = blue
  8. 8. Does Cookie Monster like Cookies? color != blue color = blue
  9. 9. Is Cookie Monster Blue? color != blue color = blue
  10. 10. Cooooookie! color != blue color = blue cookie!
  11. 11. Do you like cookies? color != blue color = blue yuck ok cookie! wears != stripes wears = stripes
  12. 12. color != blue color = blue T T T wears != stripes wears = stripes
  13. 13. color != blue color = blue T T T wears != stripes wears = stripes Do you like cookies? How many cookies will you eat? What’s your favorite kind of cookie?
  14. 14. Bootstrap or k-fold? Chi-square or entropy? Wow! Classification or regression? Binary splits or multiway? Out-of-bag or out-of-time? One tree or many? Binary or multi-class?
  15. 15. trait Evaluator[V,T] trait Tree[V,T] trait Splitter[V,T] trait Error[T,E] Wow! Such types! case class Instance[V,T]
  16. 16. false true false true Binary classification
  17. 17. 0.1 0.4 0.0 0.9 Binary classification
  18. 18. T+T+T+T= T T T T T+T+T+T+T= T+T+T+T+T= T+T+T=
  19. 19. Binary classification
  20. 20. Bigger (data) = Better (models) Generic != Fast “Why do you rob banks?”
  21. 21. Learning a tree in Scalding 11 passes through the data 21 MapReduce steps
  22. 22. T T
  23. 23. T T T T
  24. 24. T T T T T T T T
  25. 25. Step 1/21 T
  26. 26. {height: 5, color: blue, wears: fur} {height: 7, color: yellow, wears: feathers} {height: 3, color: green, wears: garbage} {height: 5, color: yellow, wears: stripes} {height: 4, color: orange, wears: stripes}
  27. 27. Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] T T T T T T T T T T Map T Reduce
  28. 28. color != blue = blue T T color != yellow = yellow T T height < 5 >= 5 T T ? Step 2/21
  29. 29. color != blue = blue T T color Step 2/21 != yellow = yellow T T ?
  30. 30. blue yellow green yellow orange
  31. 31. blue yellow green yellow orange
  32. 32. Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] S S S S S S S S Map Reduce Step 2/21 S S Other options: CountMinSketch QTree …
  33. 33. V => Boolean V => Boolean T T
  34. 34. V => Boolean V => Boolean T T T V => Boolean
  35. 35. Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] S S S S S S S S Step 3/21 S S S Split[V,T] Split[V,T] Split[V,T] Split[V,T]
  36. 36. Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] S S S S S S S S S Step 3/21 S S S Split[V,T] Split[V,T] Split[V,T] Split[V,T] S S S S S S
  37. 37. Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] Instance[V,T] S S S S S S S S S Step 3/21 S S S Split[V,T] Split[V,T] Split[V,T] Split[V,T] S S S S S S S S S S Split[V,T] Split[V,T] Split[V,T]
  38. 38. Instance[V,T] Instance[V,T] Instance[V,T]
  39. 39. Instance[V,T] Instance[V,T] Instance[V,T] … Forests!
  40. 40. Instance[V,T] Instance[V,T] Instance[V,T] …
  41. 41. V? {height: 5, color: blue, wears: fur} ? {height: 7, color: yellow, wears: feathers} ? {height: 3, color: green, wears: garbage} ? {height: 5, color: yellow, wears: stripes} ? {height: 4, color: orange, wears: stripes} ?
  42. 42. PLANET http://static.googleusercontent.com/media/ research.google.com/en/us/pubs/archive/36296.pdf Scalding + Algebird http://github.com/twitter/scalding http://github.com/twitter/algebird Coming soon http://github.com/stripe/brushfire

×