Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Strata new-york-2012

847 views

Published on

This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications.

Published in: Technology
  • Be the first to comment

Strata new-york-2012

  1. 1. Online LearningBayesian bandits and more©MapR Technologies - Confidential 1
  2. 2. whoami – Ted Dunning Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning We’re hiring at MapR For slides and other info http://www.slideshare.net/tdunning©MapR Technologies - Confidential 2
  3. 3. Online Scalable Incremental©MapR Technologies - Confidential 3
  4. 4. Scalability and Learning What does scalable mean? What are inherent characteristics of scalable learning? What are the logical implications?©MapR Technologies - Confidential 4
  5. 5. Scalable ≈ On-line If you squint just right©MapR Technologies - Confidential 5
  6. 6. unit of work ≈ unit of time©MapR Technologies - Confidential 6
  7. 7. Infinite Data Learning Stream State©MapR Technologies - Confidential 7
  8. 8. Pick One©MapR Technologies - Confidential 8
  9. 9. ©MapR Technologies - Confidential 9
  10. 10. ©MapR Technologies - Confidential 10
  11. 11. Now pick again©MapR Technologies - Confidential 11
  12. 12. A Quick Diversion You see a coin – What is the probability of heads? – Could it be larger or smaller than that? I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change? – And did it ever have a single value?©MapR Technologies - Confidential 12
  13. 13. Which One to Play? One may be better than the other The better coin pays off at some rate Playing the other will pay off at a lesser rate – Playing the lesser coin has “opportunity cost” But how do we know which is which? – Explore versus Exploit!©MapR Technologies - Confidential 13
  14. 14. A First Conclusion Probability as expressed by humans is subjective and depends on information and experience©MapR Technologies - Confidential 14
  15. 15. A Second Conclusion A single number is a bad way to express uncertain knowledge A distribution of values might be better©MapR Technologies - Confidential 15
  16. 16. I Dunno©MapR Technologies - Confidential 16
  17. 17. 5 and 5©MapR Technologies - Confidential 17
  18. 18. 2 and 10©MapR Technologies - Confidential 18
  19. 19. The Cynic Among Us©MapR Technologies - Confidential 19
  20. 20. Demo©MapR Technologies - Confidential 20
  21. 21. An Example©MapR Technologies - Confidential 21
  22. 22. An Example©MapR Technologies - Confidential 22
  23. 23. The Cluster Proximity Features Every point can be described by the nearest cluster – 4.3 bits per point in this case – Significant error that can be decreased (to a point) by increasing number of clusters Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – Error is negligible – Unwinds the data into a simple representation©MapR Technologies - Confidential 23
  24. 24. Diagonalized Cluster Proximity©MapR Technologies - Confidential 24
  25. 25. Lots of Clusters Are Fine©MapR Technologies - Confidential 25
  26. 26. Surrogate Method Start with sloppy clustering into κ = k log n clusters Use these clusters as a weighted surrogate for the data Cluster surrogate data using ball k-means Results are provably high quality for highly clusterable data Sloppy clustering can be done on-line Surrogate can be kept in memory Ball k-means pass can be done at any time©MapR Technologies - Confidential 26
  27. 27. Algorithm Costs O(k d log n) per point for Lloyd’s algorithm … not so good for k = 2000, n = 108 Surrogate methods …. O(d log κ) = O(d (log k + log log n)) per point This is a big deal: – k d log n = 2000 x 10 x 26 = 500,000 – log k + log log n = 11 + 5 = 17 – 30,000 times faster makes the grade as a bona fide big deal©MapR Technologies - Confidential 27
  28. 28. 30,000 times faster sounds good©MapR Technologies - Confidential 28
  29. 29. 30,000 times faster sounds good but that isn’t the big news©MapR Technologies - Confidential 29
  30. 30. 30,000 times faster sounds good but that isn’t the big news these algorithms do on-line clustering©MapR Technologies - Confidential 30
  31. 31. Parallel Speedup? 200 Non- threaded ✓ 100 2 Tim e per point (μs) Threaded version 3 50 4 40 6 5 8 30 10 14 12 20 Perfect Scaling 16 10 1 2 3 4 5 20 Threads©MapR Technologies - Confidential 31
  32. 32. What about deployment?©MapR Technologies - Confidential 32
  33. 33. Infinite Data Learning Stream State©MapR Technologies - Confidential 33
  34. 34. Data Mapper Split State©MapR Technologies - Confidential 34
  35. 35. Data Mapper Mapper Split Mapper Need shared memory! State©MapR Technologies - Confidential 35
  36. 36. whoami – Ted Dunning We’re hiring at MapR Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning For slides and other info http://www.slideshare.net/tdunning©MapR Technologies - Confidential 36

×