How to Determine which Algorithms Really Matter

464
-1

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
464
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • TED: consider using the word “interesting” instead of “anomalous”… people may think you are talking about anomaly detection…
  • Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
  • Only important co-occurrence is puppy follows apple
  • *Take that row of matrix and combine with all the meta data we might have…

    *Important thing to get from the co-occurrence matrix is this indicator..
    Cool thing: analogous to what a lot of recommendation engines do

    *This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)
    Find the useful co-occurrence and get rid of the rest.
    Sparsify and get the anomalous co-occurrence
  • Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
  • *This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence.
    *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
  • This is a diagnostics window in the LucidWorks Solr index (not the web interface a user would see). It’s a way for the developer to do a rough evaluation (laugh test) of the choices offered by the recommendation engine.

    In other words, do these indicator artists represented by their indicator Id make reasonable recommendations

    Note to trainer: artist 303 happens to be The Beatles. Is that a good match for Chuck Berry?
  • How to Determine which Algorithms Really Matter

    1. 1. © 2014 MapR Technologies 1 © MapR Technologies, confidential Hadoop Summit 2014 Which Algorithms Really Matter?
    2. 2. © 2014 MapR Technologies 2 Me, Us • Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG • MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR
    3. 3. © 2014 MapR Technologies 4 Topic For Today • What is important? What is not? • Why? • What is the difference from academic research? • Some examples
    4. 4. © 2014 MapR Technologies 5 What is Important? • Deployable • Robust • Transparent • Skillset and mindset matched? • Proportionate
    5. 5. © 2014 MapR Technologies 6 What is Important? • Deployable – Clever prototypes don’t count if they can’t be standardized • Robust • Transparent • Skillset and mindset matched? • Proportionate
    6. 6. © 2014 MapR Technologies 7 What is Important? • Deployable – Clever prototypes don’t count • Robust – Mishandling is common • Transparent – Will degradation be obvious? • Skillset and mindset matched? • Proportionate
    7. 7. © 2014 MapR Technologies 8 What is Important? • Deployable – Clever prototypes don’t count • Robust – Mishandling is common • Transparent – Will degradation be obvious? • Skillset and mindset matched? – How long will your fancy data scientist enjoy doing standard ops tasks? • Proportionate – Where is the highest value per minute of effort?
    8. 8. © 2014 MapR Technologies 9 Academic Goals vs Pragmatics • Academic goals – Reproducible – Isolate theoretically important aspects – Work on novel problems • Pragmatics – Highest net value – Available data is constantly changing – Diligence and consistency have larger impact than cleverness – Many systems feed themselves, exploration and exploitation are both important – Engineering constraints on budget and schedule
    9. 9. © 2014 MapR Technologies 10 Example 1: Making Recommendations Better
    10. 10. © 2014 MapR Technologies 11 Recommendation Advances • What are the most important algorithmic advances in recommendations over the last 10 years? • Cooccurrence analysis? • Matrix completion via factorization? • Latent factor log-linear models? • Temporal dynamics?
    11. 11. © 2014 MapR Technologies 12 The Winner – None of the Above • What are the most important algorithmic advances in recommendations over the last 10 years? 1. Result dithering 2. Anti-flood
    12. 12. © 2014 MapR Technologies 13 The Real Issues • Exploration • Diversity • Speed • Not the last fraction of a percent
    13. 13. © 2014 MapR Technologies 14 Result Dithering • Dithering is used to re-order recommendation results – Re-ordering is done randomly • Dithering is guaranteed to make off-line performance worse • Dithering also has a near perfect record of making actual performance much better
    14. 14. © 2014 MapR Technologies 15 Result Dithering • Dithering is used to re-order recommendation results – Re-ordering is done randomly • Dithering is guaranteed to make off-line performance worse • Dithering also has a near perfect record of making actual performance much better “Made more difference than any other change”
    15. 15. © 2014 MapR Technologies 16 Simple Dithering Algorithm • Generate synthetic score from log rank plus Gaussian • Pick noise scale to provide desired level of mixing • Typically • Oh… use floor(t/T) as seed s = logr + N(0,e) e Î 0.4, 0.8[ ] Dr µrexpe
    16. 16. © 2014 MapR Technologies 17 Example … ε = 0.5 1 2 6 5 3 4 13 16 1 2 3 8 5 7 6 34 1 4 3 2 6 7 11 10 1 2 4 3 15 7 13 19 1 6 2 3 4 16 9 5 1 2 3 5 24 7 17 13 1 2 3 4 6 12 5 14 2 1 3 5 7 6 4 17 4 1 2 7 3 9 8 5 2 1 5 3 4 7 13 6 3 1 5 4 2 7 8 6 2 1 3 4 7 12 17 16
    17. 17. © 2014 MapR Technologies 18 Example … ε = log 2 = 0.69 1 2 8 3 9 15 7 6 1 8 14 15 3 2 22 10 1 3 8 2 10 5 7 4 1 2 10 7 3 8 6 14 1 5 33 15 2 9 11 29 1 2 7 3 5 4 19 6 1 3 5 23 9 7 4 2 2 4 11 8 3 1 44 9 2 3 1 4 6 7 8 33 3 4 1 2 10 11 15 14 11 1 2 4 5 7 3 14 1 8 7 3 22 11 2 33
    18. 18. © 2014 MapR Technologies 19 Exploring The Second Page
    19. 19. © 2014 MapR Technologies 20 Lesson 1: Exploration is good
    20. 20. © 2014 MapR Technologies 21 Example 2: Bayesian Bandits
    21. 21. © 2014 MapR Technologies 22 Bayesian Bandits • Based on Thompson sampling • Very general sequential test • Near optimal regret • Trade-off exploration and exploitation • Possibly best known solution for exploration/exploitation • Incredibly simple
    22. 22. © 2014 MapR Technologies 23 Thompson Sampling • Select each shell according to the probability that it is the best • Probability that it is the best can be computed using posterior • But I promised a simple answer P(i is best) = I E[ri |q]= max j E[rj |q] é ëê ù ûúò P(q | D) dq
    23. 23. © 2014 MapR Technologies 24 Thompson Sampling – Take 2 • Sample θ • Pick i to maximize reward • Record result from using i q ~P(q | D) i = argmax j E[rj |q]
    24. 24. © 2014 MapR Technologies 25 Fast Convergence 11000 100 200 300 400 500 600 700 800 900 1000 0.12 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 n regret ε- greedy, ε = 0.05 Bayesian Bandit with Gamma- Normal
    25. 25. © 2014 MapR Technologies 26 Thompson Sampling on Ads An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011
    26. 26. © 2014 MapR Technologies 27 Bayesian Bandits versus Result Dithering • Many useful systems are difficult to frame in fully Bayesian form • Thompson sampling cannot be applied without posterior sampling • Can still do useful exploration with dithering • But better to use Thompson sampling if possible
    27. 27. © 2014 MapR Technologies 28 Lesson 2: Exploration is pretty easy to do and pays big benefits.
    28. 28. © 2014 MapR Technologies 29 Example 3: On-line Clustering
    29. 29. © 2014 MapR Technologies 30 The Problem • K-means clustering is useful for feature extraction or compression • At scale and at high dimension, the desirable number of clusters increases • Very large number of clusters may require more passes through the data • Super-linear scaling is generally infeasible
    30. 30. © 2014 MapR Technologies 31 The Solution • Sketch-based algorithms produce a sketch of the data • Streaming k-means uses adaptive dp-means to produce this sketch in the form of many weighted centroids which approximate the original distribution • The size of the sketch grows very slowly with increasing data size • Many operations such as clustering are well behaved on sketches Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson. Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan.
    31. 31. © 2014 MapR Technologies 32 An Example
    32. 32. © 2014 MapR Technologies 33 An Example
    33. 33. © 2014 MapR Technologies 34 The Cluster Proximity Features • Every point can be described by the nearest cluster – 4.3 bits per point in this case – Significant error that can be decreased (to a point) by increasing number of clusters • Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – Error is negligible – Unwinds the data into a simple representation • Or we can increase the number of clusters (n fold increase adds log n bits per point, decreases error by sqrt(n)
    34. 34. © 2014 MapR Technologies 35 Diagonalized Cluster Proximity
    35. 35. © 2014 MapR Technologies 36 Lots of Clusters Are Fine
    36. 36. © 2014 MapR Technologies 37 Typical k-means Failure Selecting two seeds here cannot be fixed with Lloyds Result is that these two clusters get glued together
    37. 37. © 2014 MapR Technologies 38 Streaming k-means Ideas • By using a sketch with lots (k log N) of centroids, we avoid pathological cases • We still get a very good result if the sketch is created – in one pass – with approximate search • In fact, adaptive dp-means works just fine • In the end, the sketch can be used for clustering or …
    38. 38. © 2014 MapR Technologies 39 Lesson 3: Sketches make big data small.
    39. 39. © 2014 MapR Technologies 40 Example 4: Search Abuse
    40. 40. © 2014 MapR Technologies 41 Recommendation Alice got an apple and a puppyAlice Charles got a bicycleCharles Bob Bob got an apple
    41. 41. © 2014 MapR Technologies 42 Recommendation Alice got an apple and a puppyAlice Charles got a bicycleCharles Bob Bob got an apple. What else would Bob like?
    42. 42. © 2014 MapR Technologies 43 Recommendation Alice got an apple and a puppyAlice Charles got a bicycleCharles Bob A puppy!
    43. 43. © 2014 MapR Technologies 44 History Matrix: Users x Items Alice Bob Charles ✔ ✔ ✔ ✔ ✔ ✔ ✔
    44. 44. © 2014 MapR Technologies 45 Co-Occurrence Matrix: Items x Items - 1 2 1 1 1 1 2 1 0 0 0 0 Use LLR test to turn co- occurrence into indicators of interesting co-occurrence
    45. 45. © 2014 MapR Technologies 46 Indicator Matrix: Anomalous Co-Occurrence ✔ ✔
    46. 46. © 2014 MapR Technologies 47 Co-occurrence Binary Matrix 1 1not not 1
    47. 47. © 2014 MapR Technologies 48 Indicator Matrix: Anomalous Co-Occurrence ✔ ✔ Result: The marked row will be added to the indicator field in the item document…
    48. 48. © 2014 MapR Technologies 49 Indicator Matrix ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine. Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators.
    49. 49. © 2014 MapR Technologies 50 Internals of the Recommender Engine 50
    50. 50. © 2014 MapR Technologies 51 Internals of the Recommender Engine 51
    51. 51. © 2014 MapR Technologies 52 Looking Inside LucidWorks What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 52 Real-time recommendation query and results: Evaluation
    52. 52. © 2014 MapR Technologies 53 Real-life example
    53. 53. © 2014 MapR Technologies 54 Lesson 4: Recursive search abuse pays Search can implement recs Which can implement search
    54. 54. © 2014 MapR Technologies 55 Summary
    55. 55. © 2014 MapR Technologies 56
    56. 56. © 2014 MapR Technologies 57 Me, Us • Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG • MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR

    ×