Strata new-york-2012
Upcoming SlideShare
Loading in...5
×
 

Strata new-york-2012

on

  • 888 views

This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications.

This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications.

Statistics

Views

Total Views
888
Slideshare-icon Views on SlideShare
885
Embed Views
3

Actions

Likes
1
Downloads
13
Comments
0

1 Embed 3

http://www.linkedin.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Strata new-york-2012 Strata new-york-2012 Presentation Transcript

    • Online LearningBayesian bandits and more©MapR Technologies - Confidential 1
    • whoami – Ted Dunning Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning We’re hiring at MapR For slides and other info http://www.slideshare.net/tdunning©MapR Technologies - Confidential 2
    • Online Scalable Incremental©MapR Technologies - Confidential 3
    • Scalability and Learning What does scalable mean? What are inherent characteristics of scalable learning? What are the logical implications?©MapR Technologies - Confidential 4
    • Scalable ≈ On-line If you squint just right©MapR Technologies - Confidential 5
    • unit of work ≈ unit of time©MapR Technologies - Confidential 6
    • Infinite Data Learning Stream State©MapR Technologies - Confidential 7
    • Pick One©MapR Technologies - Confidential 8
    • ©MapR Technologies - Confidential 9
    • ©MapR Technologies - Confidential 10
    • Now pick again©MapR Technologies - Confidential 11
    • A Quick Diversion You see a coin – What is the probability of heads? – Could it be larger or smaller than that? I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change? – And did it ever have a single value?©MapR Technologies - Confidential 12
    • Which One to Play? One may be better than the other The better coin pays off at some rate Playing the other will pay off at a lesser rate – Playing the lesser coin has “opportunity cost” But how do we know which is which? – Explore versus Exploit!©MapR Technologies - Confidential 13
    • A First Conclusion Probability as expressed by humans is subjective and depends on information and experience©MapR Technologies - Confidential 14
    • A Second Conclusion A single number is a bad way to express uncertain knowledge A distribution of values might be better©MapR Technologies - Confidential 15
    • I Dunno©MapR Technologies - Confidential 16
    • 5 and 5©MapR Technologies - Confidential 17
    • 2 and 10©MapR Technologies - Confidential 18
    • The Cynic Among Us©MapR Technologies - Confidential 19
    • Demo©MapR Technologies - Confidential 20
    • An Example©MapR Technologies - Confidential 21
    • An Example©MapR Technologies - Confidential 22
    • The Cluster Proximity Features Every point can be described by the nearest cluster – 4.3 bits per point in this case – Significant error that can be decreased (to a point) by increasing number of clusters Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – Error is negligible – Unwinds the data into a simple representation©MapR Technologies - Confidential 23
    • Diagonalized Cluster Proximity©MapR Technologies - Confidential 24
    • Lots of Clusters Are Fine©MapR Technologies - Confidential 25
    • Surrogate Method Start with sloppy clustering into κ = k log n clusters Use these clusters as a weighted surrogate for the data Cluster surrogate data using ball k-means Results are provably high quality for highly clusterable data Sloppy clustering can be done on-line Surrogate can be kept in memory Ball k-means pass can be done at any time©MapR Technologies - Confidential 26
    • Algorithm Costs O(k d log n) per point for Lloyd’s algorithm … not so good for k = 2000, n = 108 Surrogate methods …. O(d log κ) = O(d (log k + log log n)) per point This is a big deal: – k d log n = 2000 x 10 x 26 = 500,000 – log k + log log n = 11 + 5 = 17 – 30,000 times faster makes the grade as a bona fide big deal©MapR Technologies - Confidential 27
    • 30,000 times faster sounds good©MapR Technologies - Confidential 28
    • 30,000 times faster sounds good but that isn’t the big news©MapR Technologies - Confidential 29
    • 30,000 times faster sounds good but that isn’t the big news these algorithms do on-line clustering©MapR Technologies - Confidential 30
    • Parallel Speedup? 200 Non- threaded ✓ 100 2 Tim e per point (μs) Threaded version 3 50 4 40 6 5 8 30 10 14 12 20 Perfect Scaling 16 10 1 2 3 4 5 20 Threads©MapR Technologies - Confidential 31
    • What about deployment?©MapR Technologies - Confidential 32
    • Infinite Data Learning Stream State©MapR Technologies - Confidential 33
    • Data Mapper Split State©MapR Technologies - Confidential 34
    • Data Mapper Mapper Split Mapper Need shared memory! State©MapR Technologies - Confidential 35
    • whoami – Ted Dunning We’re hiring at MapR Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning For slides and other info http://www.slideshare.net/tdunning©MapR Technologies - Confidential 36