Which Algorithms Really Matter?

©MapR Technologies 2013

1
Me, Us


Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at...
Topic For Today


What is important? What is not?



Why?



What is the difference from academic research?



Some ex...
What is Important?


Deployable



Robust



Transparent



Skillset and mindset matched?



Proportionate

©MapR Tec...
What is Important?


Deployable
–

Clever prototypes don’t count if they can’t be standardized



Robust



Transparent...
What is Important?


Deployable
–



Robust
–



Clever prototypes don’t count
Mishandling is common

Transparent
–

Wi...
What is Important?


Deployable
–



Robust
–



Will degradation be obvious?

Skillset and mindset matched?
–



Mish...
Academic Goals vs Pragmatics


Academic goals
–
–

–



Reproducible
Isolate theoretically important aspects
Work on nov...
Example 1:
Making Recommendations Better

©MapR Technologies 2013

10
Recommendation Advances


What are the most important algorithmic advances in
recommendations over the last 10 years?


...
The Winner – None of the Above


What are the most important algorithmic advances in
recommendations over the last 10 yea...
The Real Issues


Exploration



Diversity



Speed



Not the last fraction of a percent

©MapR Technologies 2013

13
Result Dithering


Dithering is used to re-order recommendation results
–

Re-ordering is done randomly



Dithering is ...
Result Dithering


Dithering is used to re-order recommendation results
–

Re-ordering is done randomly



Dithering is ...
Simple Dithering Algorithm


Generate synthetic score from log rank plus Gaussian

s = logr + N(0, e )


Pick noise scal...
Example … ε = 0.5
1
1
1
1
1
1
1
2
4
2
3
2
©MapR Technologies 2013

2
2
4
2
6
2
2
1
1
1
1
1

6
3
3
4
2
3
3
3
2
5
5
3

5
8
2...
Example … ε = log 2 = 0.69
1
1
1
1
1
1
1
2
2
3
11
1
©MapR Technologies 2013

2
8
3
2
5
2
3
4
3
4
1
8

8
14
8
10
33
7
5
11
...
Exploring The Second Page

©MapR Technologies 2013

19
Lesson 1:
Exploration is good

©MapR Technologies 2013

20
Example 2:
Bayesian Bandits

©MapR Technologies 2013

21
Bayesian Bandits


Based on Thompson sampling



Very general sequential test



Near optimal regret



Trade-off expl...
Thompson Sampling


Select each shell according to the probability that it is the best



Probability that it is the bes...
Thompson Sampling – Take 2


Sample θ

q ~ P(q | D)


Pick i to maximize reward

i = argmax E[rj | q ]
j



Record resu...
Fast Convergence
0.12
0.11
0.1
0.09
0.08

regret

0.07
0.06

ε- greedy, ε = 0.05
0.05
0.04

Bayesian Bandit with Gam m a- ...
Thompson Sampling on Ads

An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011
©MapR Technologies 2013

26
Bayesian Bandits versus Result Dithering


Many useful systems are difficult to frame in fully Bayesian form



Thompson...
Lesson 2:
Exploration is pretty
easy to do and pays
big benefits.

©MapR Technologies 2013

28
Example 3:
On-line Clustering

©MapR Technologies 2013

29
The Problem


K-means clustering is useful for feature extraction or compression



At scale and at high dimension, the ...
The Solution


Sketch-based algorithms produce a sketch of the data



Streaming k-means uses adaptive dp-means to produ...
An Example

©MapR Technologies 2013

32
An Example

©MapR Technologies 2013

33
The Cluster Proximity Features


Every point can be described by the nearest cluster
–
–



Or by the proximity to the 2...
Diagonalized Cluster Proximity

©MapR Technologies 2013

35
Lots of Clusters Are Fine

©MapR Technologies 2013

36
Typical k-means Failure

Selecting two seeds
here cannot be
fixed with Lloyds
Result is that these two
clusters get glued
...
Streaming k-means Ideas


By using a sketch with lots (k log N) of centroids, we avoid
pathological cases



We still ge...
Lesson 3:
Sketches make big
data small.

©MapR Technologies 2013

39
Example 4:
Search Abuse

©MapR Technologies 2013

40
Recommendations

Alice

Charles

©MapR Technologies 2013

Alice got an apple and a
puppy

Charles got a bicycle

41
Recommendations

Alice

Bob

Charles

©MapR Technologies 2013

Alice got an apple and a
puppy

Bob got an apple

Charles g...
Recommendations

Alice

Bob

?

What else would Bob like?

Charles

©MapR Technologies 2013

43
Log Files
Alice
Charles
Charles
Alice

Alice
Bob
Bob
©MapR Technologies 2013

44
History Matrix: Users by Items

Alice

✔

Bob

✔

Charles

©MapR Technologies 2013

✔

✔
✔
✔

45

✔
Co-occurrence Matrix: Items by Items
How do you tell which co-occurrences are useful?.

1

2

1

1

2

©MapR Technologies ...
Co-occurrence Binary Matrix

not
not

©MapR Technologies 2013

1
1

47

1
Indicator Matrix: Anomalous Co-Occurrence
Result: The marked row will be added to the indicator
field in the item document...
Indicator Matrix
That one row from indicator matrix becomes the indicator field in the Solr
document used to deploy the re...
Internals of the Recommender Engine

50

©MapR Technologies 2013

50
Internals of the Recommender Engine

51

©MapR Technologies 2013

51
Looking Inside LucidWorks
Real-time recommendation query and results: Evaluation

What to recommend if new user listened t...
Real-life example

©MapR Technologies 2013

53
Lesson 4:
Recursive search abuse pays
Search can implement recs
Which can implement search

©MapR Technologies 2013

54
Summary

©MapR Technologies 2013

55
©MapR Technologies 2013

56
Me, Us


Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at...
Upcoming SlideShare
Loading in...5
×

Which Algorithms Really Matter

20,197

Published on

This is the position talk that I gave at CIKM. Included are 4 algorithms that I feel don't get much academic attention, but which are very important industrially. It isn't necessarily true that these algorithms *should* get academic attention, but I do feel that it is true that they are quite important pragmatically speaking.

Published in: Technology, Education
0 Comments
55 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
20,197
On Slideshare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
271
Comments
0
Likes
55
Embeds 0
No embeds

No notes for slide
  • * A history of what everybody has done. Obviously this is just a cartoon because large numbers of users and interactions with items would be required to build a recommender* Next step will be to predict what a new user might like…
  • *Bob is the “new user” and getting apple is his history
  • *Here is where the recommendation engine needs to go to work…Note to trainer: you might see if audience calls out the answer before revealing next slide…
  • Note to trainer: This is the situation similar to that in which we started, with three users in our history. The difference is that now everybody got a pony. Bob has apple and pony but not a puppy…yet
  • *Binary matrix is stored sparsely
  • *Convert by MapReduce into a binary matrixNote to trainer: Whether consider apple to have occurred with self is open question
  • Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
  • Only important co-occurrence is puppy follows apple
  • *Take that row of matrix and combine with all the meta data we might have…*Important thing to get from the co-occurrence matrix is this indicator..Cool thing: analogous to what a lot of recommendation engines do*This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)Find the useful co-occurrence and get rid of the rest. Sparsify and get the anomalous co-occurrence
  • Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
  • *This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence. *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
  • This is a diagnostics window in the LucidWorksSolr index (not the web interface a user would see). It’s a way for the developer to do a rough evaluation (laugh test) of the choices offered by the recommendation engine.In other words, do these indicator artists represented by their indicator Id make reasonable recommendations Note to trainer: artist 303 happens to be The Beatles. Is that a good match for Chuck Berry?
  • Which Algorithms Really Matter

    1. 1. Which Algorithms Really Matter? ©MapR Technologies 2013 1
    2. 2. Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR ©MapR Technologies 2013 2
    3. 3. Topic For Today  What is important? What is not?  Why?  What is the difference from academic research?  Some examples ©MapR Technologies 2013 4
    4. 4. What is Important?  Deployable  Robust  Transparent  Skillset and mindset matched?  Proportionate ©MapR Technologies 2013 5
    5. 5. What is Important?  Deployable – Clever prototypes don’t count if they can’t be standardized  Robust  Transparent  Skillset and mindset matched?  Proportionate ©MapR Technologies 2013 6
    6. 6. What is Important?  Deployable –  Robust –  Clever prototypes don’t count Mishandling is common Transparent – Will degradation be obvious?  Skillset and mindset matched?  Proportionate ©MapR Technologies 2013 7
    7. 7. What is Important?  Deployable –  Robust –  Will degradation be obvious? Skillset and mindset matched? –  Mishandling is common Transparent –  Clever prototypes don’t count How long will your fancy data scientist enjoy doing standard ops tasks? Proportionate – Where is the highest value per minute of effort? ©MapR Technologies 2013 8
    8. 8. Academic Goals vs Pragmatics  Academic goals – – –  Reproducible Isolate theoretically important aspects Work on novel problems Pragmatics – – – – – Highest net value Available data is constantly changing Diligence and consistency have larger impact than cleverness Many systems feed themselves, exploration and exploitation are both important Engineering constraints on budget and schedule ©MapR Technologies 2013 9
    9. 9. Example 1: Making Recommendations Better ©MapR Technologies 2013 10
    10. 10. Recommendation Advances  What are the most important algorithmic advances in recommendations over the last 10 years?  Cooccurrence analysis?  Matrix completion via factorization?  Latent factor log-linear models?  Temporal dynamics? ©MapR Technologies 2013 11
    11. 11. The Winner – None of the Above  What are the most important algorithmic advances in recommendations over the last 10 years? 1. Result dithering 2. Anti-flood ©MapR Technologies 2013 12
    12. 12. The Real Issues  Exploration  Diversity  Speed  Not the last fraction of a percent ©MapR Technologies 2013 13
    13. 13. Result Dithering  Dithering is used to re-order recommendation results – Re-ordering is done randomly  Dithering is guaranteed to make off-line performance worse  Dithering also has a near perfect record of making actual performance much better ©MapR Technologies 2013 14
    14. 14. Result Dithering  Dithering is used to re-order recommendation results – Re-ordering is done randomly  Dithering is guaranteed to make off-line performance worse  Dithering also has a near perfect record of making actual performance much better “Made more difference than any other change” ©MapR Technologies 2013 15
    15. 15. Simple Dithering Algorithm  Generate synthetic score from log rank plus Gaussian s = logr + N(0, e )  Pick noise scale to provide desired level of mixing Dr µ r exp e  Typically e Î [ 0.4, 0.8]  Oh… use floor(t/T) as seed ©MapR Technologies 2013 16
    16. 16. Example … ε = 0.5 1 1 1 1 1 1 1 2 4 2 3 2 ©MapR Technologies 2013 2 2 4 2 6 2 2 1 1 1 1 1 6 3 3 4 2 3 3 3 2 5 5 3 5 8 2 3 3 5 4 5 7 3 4 4 3 5 6 15 4 24 6 7 3 4 2 7 17 4 7 7 7 16 7 12 6 9 7 7 12 13 6 11 13 9 17 5 4 8 13 8 17 16 34 10 19 5 13 14 17 5 6 6 16
    17. 17. Example … ε = log 2 = 0.69 1 1 1 1 1 1 1 2 2 3 11 1 ©MapR Technologies 2013 2 8 3 2 5 2 3 4 3 4 1 8 8 14 8 10 33 7 5 11 1 1 2 7 3 15 2 7 15 3 23 8 4 2 4 3 9 3 10 3 2 5 9 3 6 10 5 22 18 15 2 5 8 9 4 7 1 7 11 7 11 7 22 7 6 11 19 4 44 8 15 3 2 6 10 4 14 29 6 2 9 33 14 14 33
    18. 18. Exploring The Second Page ©MapR Technologies 2013 19
    19. 19. Lesson 1: Exploration is good ©MapR Technologies 2013 20
    20. 20. Example 2: Bayesian Bandits ©MapR Technologies 2013 21
    21. 21. Bayesian Bandits  Based on Thompson sampling  Very general sequential test  Near optimal regret  Trade-off exploration and exploitation  Possibly best known solution for exploration/exploitation  Incredibly simple ©MapR Technologies 2013 22
    22. 22. Thompson Sampling  Select each shell according to the probability that it is the best  Probability that it is the best can be computed using posterior é ù P(i is best) = ò I êE[ri | q ] = max E[rj | q ]ú P(q | D) dq ë û j  But I promised a simple answer ©MapR Technologies 2013 23
    23. 23. Thompson Sampling – Take 2  Sample θ q ~ P(q | D)  Pick i to maximize reward i = argmax E[rj | q ] j  Record result from using i ©MapR Technologies 2013 24
    24. 24. Fast Convergence 0.12 0.11 0.1 0.09 0.08 regret 0.07 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 500 600 n ©MapR Technologies 2013 25 700 800 900 1000 1100
    25. 25. Thompson Sampling on Ads An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011 ©MapR Technologies 2013 26
    26. 26. Bayesian Bandits versus Result Dithering  Many useful systems are difficult to frame in fully Bayesian form  Thompson sampling cannot be applied without posterior sampling  Can still do useful exploration with dithering  But better to use Thompson sampling if possible ©MapR Technologies 2013 27
    27. 27. Lesson 2: Exploration is pretty easy to do and pays big benefits. ©MapR Technologies 2013 28
    28. 28. Example 3: On-line Clustering ©MapR Technologies 2013 29
    29. 29. The Problem  K-means clustering is useful for feature extraction or compression  At scale and at high dimension, the desirable number of clusters increases  Very large number of clusters may require more passes through the data  Super-linear scaling is generally infeasible ©MapR Technologies 2013 30
    30. 30. The Solution  Sketch-based algorithms produce a sketch of the data  Streaming k-means uses adaptive dp-means to produce this sketch in the form of many weighted centroids which approximate the original distribution  The size of the sketch grows very slowly with increasing data size  Many operations such as clustering are well behaved on sketches Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson. Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan. ©MapR Technologies 2013 31
    31. 31. An Example ©MapR Technologies 2013 32
    32. 32. An Example ©MapR Technologies 2013 33
    33. 33. The Cluster Proximity Features  Every point can be described by the nearest cluster – –  Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – –  4.3 bits per point in this case Significant error that can be decreased (to a point) by increasing number of clusters Error is negligible Unwinds the data into a simple representation Or we can increase the number of clusters (n fold increase adds log n bits per point, decreases error by sqrt(n) ©MapR Technologies 2013 34
    34. 34. Diagonalized Cluster Proximity ©MapR Technologies 2013 35
    35. 35. Lots of Clusters Are Fine ©MapR Technologies 2013 36
    36. 36. Typical k-means Failure Selecting two seeds here cannot be fixed with Lloyds Result is that these two clusters get glued together ©MapR Technologies 2013 37
    37. 37. Streaming k-means Ideas  By using a sketch with lots (k log N) of centroids, we avoid pathological cases  We still get a very good result if the sketch is created – – in one pass with approximate search  In fact, adaptive dp-means works just fine  In the end, the sketch can be used for clustering or … ©MapR Technologies 2013 38
    38. 38. Lesson 3: Sketches make big data small. ©MapR Technologies 2013 39
    39. 39. Example 4: Search Abuse ©MapR Technologies 2013 40
    40. 40. Recommendations Alice Charles ©MapR Technologies 2013 Alice got an apple and a puppy Charles got a bicycle 41
    41. 41. Recommendations Alice Bob Charles ©MapR Technologies 2013 Alice got an apple and a puppy Bob got an apple Charles got a bicycle 42
    42. 42. Recommendations Alice Bob ? What else would Bob like? Charles ©MapR Technologies 2013 43
    43. 43. Log Files Alice Charles Charles Alice Alice Bob Bob ©MapR Technologies 2013 44
    44. 44. History Matrix: Users by Items Alice ✔ Bob ✔ Charles ©MapR Technologies 2013 ✔ ✔ ✔ ✔ 45 ✔
    45. 45. Co-occurrence Matrix: Items by Items How do you tell which co-occurrences are useful?. 1 2 1 1 2 ©MapR Technologies 2013 1 0 - 0 1 1 46 0 0
    46. 46. Co-occurrence Binary Matrix not not ©MapR Technologies 2013 1 1 47 1
    47. 47. Indicator Matrix: Anomalous Co-Occurrence Result: The marked row will be added to the indicator field in the item document… ✔ ✔ ©MapR Technologies 2013 48
    48. 48. Indicator Matrix That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine. ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators. ©MapR Technologies 2013 49
    49. 49. Internals of the Recommender Engine 50 ©MapR Technologies 2013 50
    50. 50. Internals of the Recommender Engine 51 ©MapR Technologies 2013 51
    51. 51. Looking Inside LucidWorks Real-time recommendation query and results: Evaluation What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 52 ©MapR Technologies 2013 52
    52. 52. Real-life example ©MapR Technologies 2013 53
    53. 53. Lesson 4: Recursive search abuse pays Search can implement recs Which can implement search ©MapR Technologies 2013 54
    54. 54. Summary ©MapR Technologies 2013 55
    55. 55. ©MapR Technologies 2013 56
    56. 56. Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR ©MapR Technologies 2013 57
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×