Strata New York 2012
Upcoming SlideShare
Loading in...5
×
 

Strata New York 2012

on

  • 200 views

This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications. Given by Ted Dunning at Strata New York.

This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications. Given by Ted Dunning at Strata New York.

Statistics

Views

Total Views
200
Views on SlideShare
200
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Strata New York 2012 Strata New York 2012 Presentation Transcript

    • 1©MapR Technologies - Confidential Online Learning Bayesian bandits and more
    • 2©MapR Technologies - Confidential whoami – Ted Dunning  Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning  We’re hiring at MapR  For slides and other info http://www.slideshare.net/tdunning
    • 3©MapR Technologies - Confidential Online Scalable Incremental
    • 4©MapR Technologies - Confidential Scalability and Learning  What does scalable mean?  What are inherent characteristics of scalable learning?  What are the logical implications?
    • 5©MapR Technologies - Confidential Scalable ≈ On-line If you squint just right
    • 6©MapR Technologies - Confidential unit of work ≈ unit of time
    • 7©MapR Technologies - Confidential Learning State Infinite Data Stream
    • 8©MapR Technologies - Confidential Pick One
    • 9©MapR Technologies - Confidential
    • 10©MapR Technologies - Confidential
    • 11©MapR Technologies - Confidential Now pick again
    • 12©MapR Technologies - Confidential A Quick Diversion  You see a coin – What is the probability of heads? – Could it be larger or smaller than that?  I flip the coin and while it is in the air ask again  I catch the coin and ask again  I look at the coin (and you don’t) and ask again  Why does the answer change? – And did it ever have a single value?
    • 13©MapR Technologies - Confidential Which One to Play?  One may be better than the other  The better coin pays off at some rate  Playing the other will pay off at a lesser rate – Playing the lesser coin has “opportunity cost”  But how do we know which is which? – Explore versus Exploit!
    • 14©MapR Technologies - Confidential A First Conclusion  Probability as expressed by humans is subjective and depends on information and experience
    • 15©MapR Technologies - Confidential A Second Conclusion  A single number is a bad way to express uncertain knowledge  A distribution of values might be better
    • 16©MapR Technologies - Confidential I Dunno
    • 17©MapR Technologies - Confidential 5 and 5
    • 18©MapR Technologies - Confidential 2 and 10
    • 19©MapR Technologies - Confidential The Cynic Among Us
    • 20©MapR Technologies - Confidential Demo
    • 21©MapR Technologies - Confidential An Example
    • 22©MapR Technologies - Confidential An Example
    • 23©MapR Technologies - Confidential The Cluster Proximity Features  Every point can be described by the nearest cluster – 4.3 bits per point in this case – Significant error that can be decreased (to a point) by increasing number of clusters  Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – Error is negligible – Unwinds the data into a simple representation
    • 24©MapR Technologies - Confidential Diagonalized Cluster Proximity
    • 25©MapR Technologies - Confidential Lots of Clusters Are Fine
    • 26©MapR Technologies - Confidential Surrogate Method  Start with sloppy clustering into κ = k log n clusters  Use these clusters as a weighted surrogate for the data  Cluster surrogate data using ball k-means  Results are provably high quality for highly clusterable data  Sloppy clustering can be done on-line  Surrogate can be kept in memory  Ball k-means pass can be done at any time
    • 27©MapR Technologies - Confidential Algorithm Costs  O(k d log n) per point for Lloyd’s algorithm … not so good for k = 2000, n = 108  Surrogate methods …. O(d log κ) = O(d (log k + log log n)) per point  This is a big deal: – k d log n = 2000 x 10 x 26 = 500,000 – d (log k + log log n) = 10 (11 + 5) = 170 – 3,000 times faster makes the grade as a bona fide big deal
    • 28©MapR Technologies - Confidential 3,000 times faster sounds good
    • 29©MapR Technologies - Confidential 3,000 times faster sounds good but that isn’t the big news
    • 30©MapR Technologies - Confidential 3,000 times faster sounds good but that isn’t the big news these algorithms do on-line clustering
    • 31©MapR Technologies - Confidential Parallel Speedup? 1 2 3 4 5 20 10 100 20 30 40 50 200 Threads Timeperpoint(μs) 2 3 4 5 6 8 10 12 14 16 Threaded version Non- threaded Perfect Scaling ✓
    • 32©MapR Technologies - Confidential What about deployment?
    • 33©MapR Technologies - Confidential Learning State Infinite Data Stream
    • 34©MapR Technologies - Confidential Mapper State Data Split
    • 35©MapR Technologies - Confidential Mapper State Data Split Need shared memory! MapperMapper
    • 36©MapR Technologies - Confidential whoami – Ted Dunning  We’re hiring at MapR  Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning  For slides and other info http://www.slideshare.net/tdunning