Strata new-york-2012

712 views
636 views

Published on

This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
712
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Strata new-york-2012

  1. 1. Online LearningBayesian bandits and more©MapR Technologies - Confidential 1
  2. 2. whoami – Ted Dunning Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning We’re hiring at MapR For slides and other info http://www.slideshare.net/tdunning©MapR Technologies - Confidential 2
  3. 3. Online Scalable Incremental©MapR Technologies - Confidential 3
  4. 4. Scalability and Learning What does scalable mean? What are inherent characteristics of scalable learning? What are the logical implications?©MapR Technologies - Confidential 4
  5. 5. Scalable ≈ On-line If you squint just right©MapR Technologies - Confidential 5
  6. 6. unit of work ≈ unit of time©MapR Technologies - Confidential 6
  7. 7. Infinite Data Learning Stream State©MapR Technologies - Confidential 7
  8. 8. Pick One©MapR Technologies - Confidential 8
  9. 9. ©MapR Technologies - Confidential 9
  10. 10. ©MapR Technologies - Confidential 10
  11. 11. Now pick again©MapR Technologies - Confidential 11
  12. 12. A Quick Diversion You see a coin – What is the probability of heads? – Could it be larger or smaller than that? I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change? – And did it ever have a single value?©MapR Technologies - Confidential 12
  13. 13. Which One to Play? One may be better than the other The better coin pays off at some rate Playing the other will pay off at a lesser rate – Playing the lesser coin has “opportunity cost” But how do we know which is which? – Explore versus Exploit!©MapR Technologies - Confidential 13
  14. 14. A First Conclusion Probability as expressed by humans is subjective and depends on information and experience©MapR Technologies - Confidential 14
  15. 15. A Second Conclusion A single number is a bad way to express uncertain knowledge A distribution of values might be better©MapR Technologies - Confidential 15
  16. 16. I Dunno©MapR Technologies - Confidential 16
  17. 17. 5 and 5©MapR Technologies - Confidential 17
  18. 18. 2 and 10©MapR Technologies - Confidential 18
  19. 19. The Cynic Among Us©MapR Technologies - Confidential 19
  20. 20. Demo©MapR Technologies - Confidential 20
  21. 21. An Example©MapR Technologies - Confidential 21
  22. 22. An Example©MapR Technologies - Confidential 22
  23. 23. The Cluster Proximity Features Every point can be described by the nearest cluster – 4.3 bits per point in this case – Significant error that can be decreased (to a point) by increasing number of clusters Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – Error is negligible – Unwinds the data into a simple representation©MapR Technologies - Confidential 23
  24. 24. Diagonalized Cluster Proximity©MapR Technologies - Confidential 24
  25. 25. Lots of Clusters Are Fine©MapR Technologies - Confidential 25
  26. 26. Surrogate Method Start with sloppy clustering into κ = k log n clusters Use these clusters as a weighted surrogate for the data Cluster surrogate data using ball k-means Results are provably high quality for highly clusterable data Sloppy clustering can be done on-line Surrogate can be kept in memory Ball k-means pass can be done at any time©MapR Technologies - Confidential 26
  27. 27. Algorithm Costs O(k d log n) per point for Lloyd’s algorithm … not so good for k = 2000, n = 108 Surrogate methods …. O(d log κ) = O(d (log k + log log n)) per point This is a big deal: – k d log n = 2000 x 10 x 26 = 500,000 – log k + log log n = 11 + 5 = 17 – 30,000 times faster makes the grade as a bona fide big deal©MapR Technologies - Confidential 27
  28. 28. 30,000 times faster sounds good©MapR Technologies - Confidential 28
  29. 29. 30,000 times faster sounds good but that isn’t the big news©MapR Technologies - Confidential 29
  30. 30. 30,000 times faster sounds good but that isn’t the big news these algorithms do on-line clustering©MapR Technologies - Confidential 30
  31. 31. Parallel Speedup? 200 Non- threaded ✓ 100 2 Tim e per point (μs) Threaded version 3 50 4 40 6 5 8 30 10 14 12 20 Perfect Scaling 16 10 1 2 3 4 5 20 Threads©MapR Technologies - Confidential 31
  32. 32. What about deployment?©MapR Technologies - Confidential 32
  33. 33. Infinite Data Learning Stream State©MapR Technologies - Confidential 33
  34. 34. Data Mapper Split State©MapR Technologies - Confidential 34
  35. 35. Data Mapper Mapper Split Mapper Need shared memory! State©MapR Technologies - Confidential 35
  36. 36. whoami – Ted Dunning We’re hiring at MapR Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning For slides and other info http://www.slideshare.net/tdunning©MapR Technologies - Confidential 36

×