Your SlideShare is downloading. ×
Strata new-york-2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Strata new-york-2012

526
views

Published on

This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications.

This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications.

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
526
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Online LearningBayesian bandits and more©MapR Technologies - Confidential 1
  • 2. whoami – Ted Dunning Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning We’re hiring at MapR For slides and other info http://www.slideshare.net/tdunning©MapR Technologies - Confidential 2
  • 3. Online Scalable Incremental©MapR Technologies - Confidential 3
  • 4. Scalability and Learning What does scalable mean? What are inherent characteristics of scalable learning? What are the logical implications?©MapR Technologies - Confidential 4
  • 5. Scalable ≈ On-line If you squint just right©MapR Technologies - Confidential 5
  • 6. unit of work ≈ unit of time©MapR Technologies - Confidential 6
  • 7. Infinite Data Learning Stream State©MapR Technologies - Confidential 7
  • 8. Pick One©MapR Technologies - Confidential 8
  • 9. ©MapR Technologies - Confidential 9
  • 10. ©MapR Technologies - Confidential 10
  • 11. Now pick again©MapR Technologies - Confidential 11
  • 12. A Quick Diversion You see a coin – What is the probability of heads? – Could it be larger or smaller than that? I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change? – And did it ever have a single value?©MapR Technologies - Confidential 12
  • 13. Which One to Play? One may be better than the other The better coin pays off at some rate Playing the other will pay off at a lesser rate – Playing the lesser coin has “opportunity cost” But how do we know which is which? – Explore versus Exploit!©MapR Technologies - Confidential 13
  • 14. A First Conclusion Probability as expressed by humans is subjective and depends on information and experience©MapR Technologies - Confidential 14
  • 15. A Second Conclusion A single number is a bad way to express uncertain knowledge A distribution of values might be better©MapR Technologies - Confidential 15
  • 16. I Dunno©MapR Technologies - Confidential 16
  • 17. 5 and 5©MapR Technologies - Confidential 17
  • 18. 2 and 10©MapR Technologies - Confidential 18
  • 19. The Cynic Among Us©MapR Technologies - Confidential 19
  • 20. Demo©MapR Technologies - Confidential 20
  • 21. An Example©MapR Technologies - Confidential 21
  • 22. An Example©MapR Technologies - Confidential 22
  • 23. The Cluster Proximity Features Every point can be described by the nearest cluster – 4.3 bits per point in this case – Significant error that can be decreased (to a point) by increasing number of clusters Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – Error is negligible – Unwinds the data into a simple representation©MapR Technologies - Confidential 23
  • 24. Diagonalized Cluster Proximity©MapR Technologies - Confidential 24
  • 25. Lots of Clusters Are Fine©MapR Technologies - Confidential 25
  • 26. Surrogate Method Start with sloppy clustering into κ = k log n clusters Use these clusters as a weighted surrogate for the data Cluster surrogate data using ball k-means Results are provably high quality for highly clusterable data Sloppy clustering can be done on-line Surrogate can be kept in memory Ball k-means pass can be done at any time©MapR Technologies - Confidential 26
  • 27. Algorithm Costs O(k d log n) per point for Lloyd’s algorithm … not so good for k = 2000, n = 108 Surrogate methods …. O(d log κ) = O(d (log k + log log n)) per point This is a big deal: – k d log n = 2000 x 10 x 26 = 500,000 – log k + log log n = 11 + 5 = 17 – 30,000 times faster makes the grade as a bona fide big deal©MapR Technologies - Confidential 27
  • 28. 30,000 times faster sounds good©MapR Technologies - Confidential 28
  • 29. 30,000 times faster sounds good but that isn’t the big news©MapR Technologies - Confidential 29
  • 30. 30,000 times faster sounds good but that isn’t the big news these algorithms do on-line clustering©MapR Technologies - Confidential 30
  • 31. Parallel Speedup? 200 Non- threaded ✓ 100 2 Tim e per point (μs) Threaded version 3 50 4 40 6 5 8 30 10 14 12 20 Perfect Scaling 16 10 1 2 3 4 5 20 Threads©MapR Technologies - Confidential 31
  • 32. What about deployment?©MapR Technologies - Confidential 32
  • 33. Infinite Data Learning Stream State©MapR Technologies - Confidential 33
  • 34. Data Mapper Split State©MapR Technologies - Confidential 34
  • 35. Data Mapper Mapper Split Mapper Need shared memory! State©MapR Technologies - Confidential 35
  • 36. whoami – Ted Dunning We’re hiring at MapR Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning For slides and other info http://www.slideshare.net/tdunning©MapR Technologies - Confidential 36