© 2014 MapR Technologies 1
© MapR Technologies, confidential
Hadoop Summit 2014
Which Algorithms Really Matter?
© 2014 MapR Technologies 2
Me, Us
• Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
• MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
• Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
© 2014 MapR Technologies 4
Topic For Today
• What is important? What is not?
• Why?
• What is the difference from academic research?
• Some examples
© 2014 MapR Technologies 5
What is Important?
• Deployable
• Robust
• Transparent
• Skillset and mindset matched?
• Proportionate
© 2014 MapR Technologies 6
What is Important?
• Deployable
– Clever prototypes don’t count if they can’t be standardized
• Robust
• Transparent
• Skillset and mindset matched?
• Proportionate
© 2014 MapR Technologies 7
What is Important?
• Deployable
– Clever prototypes don’t count
• Robust
– Mishandling is common
• Transparent
– Will degradation be obvious?
• Skillset and mindset matched?
• Proportionate
© 2014 MapR Technologies 8
What is Important?
• Deployable
– Clever prototypes don’t count
• Robust
– Mishandling is common
• Transparent
– Will degradation be obvious?
• Skillset and mindset matched?
– How long will your fancy data scientist enjoy doing standard ops tasks?
• Proportionate
– Where is the highest value per minute of effort?
© 2014 MapR Technologies 9
Academic Goals vs Pragmatics
• Academic goals
– Reproducible
– Isolate theoretically important aspects
– Work on novel problems
• Pragmatics
– Highest net value
– Available data is constantly changing
– Diligence and consistency have larger impact than cleverness
– Many systems feed themselves, exploration and exploitation are both
important
– Engineering constraints on budget and schedule
© 2014 MapR Technologies 10
Example 1:
Making Recommendations Better
© 2014 MapR Technologies 11
Recommendation Advances
• What are the most important algorithmic advances in
recommendations over the last 10 years?
• Cooccurrence analysis?
• Matrix completion via factorization?
• Latent factor log-linear models?
• Temporal dynamics?
© 2014 MapR Technologies 12
The Winner – None of the Above
• What are the most important algorithmic advances in
recommendations over the last 10 years?
1. Result dithering
2. Anti-flood
© 2014 MapR Technologies 13
The Real Issues
• Exploration
• Diversity
• Speed
• Not the last fraction of a percent
© 2014 MapR Technologies 14
Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual
performance much better
© 2014 MapR Technologies 15
Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual
performance much better
“Made more difference than any other change”
© 2014 MapR Technologies 16
Simple Dithering Algorithm
• Generate synthetic score from log rank plus Gaussian
• Pick noise scale to provide desired level of mixing
• Typically
• Oh… use floor(t/T) as seed
s = logr + N(0,e)
e Î 0.4, 0.8[ ]
Dr µrexpe
© 2014 MapR Technologies 17
Example … ε = 0.5
1 2 6 5 3 4 13 16
1 2 3 8 5 7 6 34
1 4 3 2 6 7 11 10
1 2 4 3 15 7 13 19
1 6 2 3 4 16 9 5
1 2 3 5 24 7 17 13
1 2 3 4 6 12 5 14
2 1 3 5 7 6 4 17
4 1 2 7 3 9 8 5
2 1 5 3 4 7 13 6
3 1 5 4 2 7 8 6
2 1 3 4 7 12 17 16
© 2014 MapR Technologies 18
Example … ε = log 2 = 0.69
1 2 8 3 9 15 7 6
1 8 14 15 3 2 22 10
1 3 8 2 10 5 7 4
1 2 10 7 3 8 6 14
1 5 33 15 2 9 11 29
1 2 7 3 5 4 19 6
1 3 5 23 9 7 4 2
2 4 11 8 3 1 44 9
2 3 1 4 6 7 8 33
3 4 1 2 10 11 15 14
11 1 2 4 5 7 3 14
1 8 7 3 22 11 2 33
© 2014 MapR Technologies 19
Exploring The Second Page
© 2014 MapR Technologies 20
Lesson 1:
Exploration is good
© 2014 MapR Technologies 21
Example 2:
Bayesian Bandits
© 2014 MapR Technologies 22
Bayesian Bandits
• Based on Thompson sampling
• Very general sequential test
• Near optimal regret
• Trade-off exploration and exploitation
• Possibly best known solution for exploration/exploitation
• Incredibly simple
© 2014 MapR Technologies 23
Thompson Sampling
• Select each shell according to the probability that it is the best
• Probability that it is the best can be computed using posterior
• But I promised a simple answer
P(i is best) = I E[ri |q]= max
j
E[rj |q]
é
ëê
ù
ûúò P(q | D) dq
© 2014 MapR Technologies 24
Thompson Sampling – Take 2
• Sample θ
• Pick i to maximize reward
• Record result from using i
q ~P(q | D)
i = argmax
j
E[rj |q]
© 2014 MapR Technologies 25
Fast Convergence
11000 100 200 300 400 500 600 700 800 900 1000
0.12
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
n
regret
ε- greedy, ε = 0.05
Bayesian Bandit with Gamma- Normal
© 2014 MapR Technologies 26
Thompson Sampling on Ads
An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011
© 2014 MapR Technologies 27
Bayesian Bandits versus Result Dithering
• Many useful systems are difficult to frame in fully Bayesian form
• Thompson sampling cannot be applied without posterior
sampling
• Can still do useful exploration with dithering
• But better to use Thompson sampling if possible
© 2014 MapR Technologies 28
Lesson 2:
Exploration is pretty easy to
do and pays big benefits.
© 2014 MapR Technologies 29
Example 3:
On-line Clustering
© 2014 MapR Technologies 30
The Problem
• K-means clustering is useful for feature extraction or
compression
• At scale and at high dimension, the desirable number of clusters
increases
• Very large number of clusters may require more passes through
the data
• Super-linear scaling is generally infeasible
© 2014 MapR Technologies 31
The Solution
• Sketch-based algorithms produce a sketch of the data
• Streaming k-means uses adaptive dp-means to produce this
sketch in the form of many weighted centroids which
approximate the original distribution
• The size of the sketch grows very slowly with increasing data
size
• Many operations such as clustering are well behaved on
sketches
Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson.
Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan.
© 2014 MapR Technologies 32
An Example
© 2014 MapR Technologies 33
An Example
© 2014 MapR Technologies 34
The Cluster Proximity Features
• Every point can be described by the nearest cluster
– 4.3 bits per point in this case
– Significant error that can be decreased (to a point) by increasing
number of clusters
• Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign
bit + 2 proximities)
– Error is negligible
– Unwinds the data into a simple representation
• Or we can increase the number of clusters (n fold increase adds
log n bits per point, decreases error by sqrt(n)
© 2014 MapR Technologies 35
Diagonalized Cluster Proximity
© 2014 MapR Technologies 36
Lots of Clusters Are Fine
© 2014 MapR Technologies 37
Typical k-means Failure
Selecting two seeds
here cannot be
fixed with Lloyds
Result is that these
two clusters get glued
together
© 2014 MapR Technologies 38
Streaming k-means Ideas
• By using a sketch with lots (k log N) of centroids, we avoid
pathological cases
• We still get a very good result if the sketch is created
– in one pass
– with approximate search
• In fact, adaptive dp-means works just fine
• In the end, the sketch can be used for clustering or …
© 2014 MapR Technologies 39
Lesson 3:
Sketches make big data small.
© 2014 MapR Technologies 40
Example 4:
Search Abuse
© 2014 MapR Technologies 41
Recommendation
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
Bob Bob got an apple
© 2014 MapR Technologies 42
Recommendation
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
Bob Bob got an apple. What else would Bob like?
© 2014 MapR Technologies 43
Recommendation
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
Bob A puppy!
© 2014 MapR Technologies 44
History Matrix: Users x Items
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔
© 2014 MapR Technologies 45
Co-Occurrence Matrix: Items x Items
-
1 2
1 1
1
1
2 1
0
0
0 0
Use LLR test to turn co-
occurrence into indicators of
interesting co-occurrence
© 2014 MapR Technologies 46
Indicator Matrix: Anomalous Co-Occurrence
✔
✔
© 2014 MapR Technologies 47
Co-occurrence Binary Matrix
1
1not
not
1
© 2014 MapR Technologies 48
Indicator Matrix: Anomalous Co-Occurrence
✔
✔
Result: The marked row will be added to the indicator field in the
item document…
© 2014 MapR Technologies 49
Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators: (t1)
That one row from indicator matrix becomes the indicator field in the
Solr document used to deploy the recommendation engine.
Note: data for the
indicator field is added
directly to meta-data
for a document in Solr
index. You don’t need
to create a separate
index for the
indicators.
© 2014 MapR Technologies 50
Internals of the Recommender Engine
50
© 2014 MapR Technologies 51
Internals of the Recommender Engine
51
© 2014 MapR Technologies 52
Looking Inside LucidWorks
What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”
52
Real-time recommendation query and results: Evaluation
© 2014 MapR Technologies 53
Real-life example
© 2014 MapR Technologies 54
Lesson 4:
Recursive search abuse pays
Search can implement recs
Which can implement search
© 2014 MapR Technologies 55
Summary
© 2014 MapR Technologies 56
© 2014 MapR Technologies 57
Me, Us
• Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
• MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
• Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR

How to Determine which Algorithms Really Matter

  • 1.
    © 2014 MapRTechnologies 1 © MapR Technologies, confidential Hadoop Summit 2014 Which Algorithms Really Matter?
  • 2.
    © 2014 MapRTechnologies 2 Me, Us • Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG • MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR
  • 3.
    © 2014 MapRTechnologies 4 Topic For Today • What is important? What is not? • Why? • What is the difference from academic research? • Some examples
  • 4.
    © 2014 MapRTechnologies 5 What is Important? • Deployable • Robust • Transparent • Skillset and mindset matched? • Proportionate
  • 5.
    © 2014 MapRTechnologies 6 What is Important? • Deployable – Clever prototypes don’t count if they can’t be standardized • Robust • Transparent • Skillset and mindset matched? • Proportionate
  • 6.
    © 2014 MapRTechnologies 7 What is Important? • Deployable – Clever prototypes don’t count • Robust – Mishandling is common • Transparent – Will degradation be obvious? • Skillset and mindset matched? • Proportionate
  • 7.
    © 2014 MapRTechnologies 8 What is Important? • Deployable – Clever prototypes don’t count • Robust – Mishandling is common • Transparent – Will degradation be obvious? • Skillset and mindset matched? – How long will your fancy data scientist enjoy doing standard ops tasks? • Proportionate – Where is the highest value per minute of effort?
  • 8.
    © 2014 MapRTechnologies 9 Academic Goals vs Pragmatics • Academic goals – Reproducible – Isolate theoretically important aspects – Work on novel problems • Pragmatics – Highest net value – Available data is constantly changing – Diligence and consistency have larger impact than cleverness – Many systems feed themselves, exploration and exploitation are both important – Engineering constraints on budget and schedule
  • 9.
    © 2014 MapRTechnologies 10 Example 1: Making Recommendations Better
  • 10.
    © 2014 MapRTechnologies 11 Recommendation Advances • What are the most important algorithmic advances in recommendations over the last 10 years? • Cooccurrence analysis? • Matrix completion via factorization? • Latent factor log-linear models? • Temporal dynamics?
  • 11.
    © 2014 MapRTechnologies 12 The Winner – None of the Above • What are the most important algorithmic advances in recommendations over the last 10 years? 1. Result dithering 2. Anti-flood
  • 12.
    © 2014 MapRTechnologies 13 The Real Issues • Exploration • Diversity • Speed • Not the last fraction of a percent
  • 13.
    © 2014 MapRTechnologies 14 Result Dithering • Dithering is used to re-order recommendation results – Re-ordering is done randomly • Dithering is guaranteed to make off-line performance worse • Dithering also has a near perfect record of making actual performance much better
  • 14.
    © 2014 MapRTechnologies 15 Result Dithering • Dithering is used to re-order recommendation results – Re-ordering is done randomly • Dithering is guaranteed to make off-line performance worse • Dithering also has a near perfect record of making actual performance much better “Made more difference than any other change”
  • 15.
    © 2014 MapRTechnologies 16 Simple Dithering Algorithm • Generate synthetic score from log rank plus Gaussian • Pick noise scale to provide desired level of mixing • Typically • Oh… use floor(t/T) as seed s = logr + N(0,e) e Î 0.4, 0.8[ ] Dr µrexpe
  • 16.
    © 2014 MapRTechnologies 17 Example … ε = 0.5 1 2 6 5 3 4 13 16 1 2 3 8 5 7 6 34 1 4 3 2 6 7 11 10 1 2 4 3 15 7 13 19 1 6 2 3 4 16 9 5 1 2 3 5 24 7 17 13 1 2 3 4 6 12 5 14 2 1 3 5 7 6 4 17 4 1 2 7 3 9 8 5 2 1 5 3 4 7 13 6 3 1 5 4 2 7 8 6 2 1 3 4 7 12 17 16
  • 17.
    © 2014 MapRTechnologies 18 Example … ε = log 2 = 0.69 1 2 8 3 9 15 7 6 1 8 14 15 3 2 22 10 1 3 8 2 10 5 7 4 1 2 10 7 3 8 6 14 1 5 33 15 2 9 11 29 1 2 7 3 5 4 19 6 1 3 5 23 9 7 4 2 2 4 11 8 3 1 44 9 2 3 1 4 6 7 8 33 3 4 1 2 10 11 15 14 11 1 2 4 5 7 3 14 1 8 7 3 22 11 2 33
  • 18.
    © 2014 MapRTechnologies 19 Exploring The Second Page
  • 19.
    © 2014 MapRTechnologies 20 Lesson 1: Exploration is good
  • 20.
    © 2014 MapRTechnologies 21 Example 2: Bayesian Bandits
  • 21.
    © 2014 MapRTechnologies 22 Bayesian Bandits • Based on Thompson sampling • Very general sequential test • Near optimal regret • Trade-off exploration and exploitation • Possibly best known solution for exploration/exploitation • Incredibly simple
  • 22.
    © 2014 MapRTechnologies 23 Thompson Sampling • Select each shell according to the probability that it is the best • Probability that it is the best can be computed using posterior • But I promised a simple answer P(i is best) = I E[ri |q]= max j E[rj |q] é ëê ù ûúò P(q | D) dq
  • 23.
    © 2014 MapRTechnologies 24 Thompson Sampling – Take 2 • Sample θ • Pick i to maximize reward • Record result from using i q ~P(q | D) i = argmax j E[rj |q]
  • 24.
    © 2014 MapRTechnologies 25 Fast Convergence 11000 100 200 300 400 500 600 700 800 900 1000 0.12 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 n regret ε- greedy, ε = 0.05 Bayesian Bandit with Gamma- Normal
  • 25.
    © 2014 MapRTechnologies 26 Thompson Sampling on Ads An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011
  • 26.
    © 2014 MapRTechnologies 27 Bayesian Bandits versus Result Dithering • Many useful systems are difficult to frame in fully Bayesian form • Thompson sampling cannot be applied without posterior sampling • Can still do useful exploration with dithering • But better to use Thompson sampling if possible
  • 27.
    © 2014 MapRTechnologies 28 Lesson 2: Exploration is pretty easy to do and pays big benefits.
  • 28.
    © 2014 MapRTechnologies 29 Example 3: On-line Clustering
  • 29.
    © 2014 MapRTechnologies 30 The Problem • K-means clustering is useful for feature extraction or compression • At scale and at high dimension, the desirable number of clusters increases • Very large number of clusters may require more passes through the data • Super-linear scaling is generally infeasible
  • 30.
    © 2014 MapRTechnologies 31 The Solution • Sketch-based algorithms produce a sketch of the data • Streaming k-means uses adaptive dp-means to produce this sketch in the form of many weighted centroids which approximate the original distribution • The size of the sketch grows very slowly with increasing data size • Many operations such as clustering are well behaved on sketches Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson. Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan.
  • 31.
    © 2014 MapRTechnologies 32 An Example
  • 32.
    © 2014 MapRTechnologies 33 An Example
  • 33.
    © 2014 MapRTechnologies 34 The Cluster Proximity Features • Every point can be described by the nearest cluster – 4.3 bits per point in this case – Significant error that can be decreased (to a point) by increasing number of clusters • Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – Error is negligible – Unwinds the data into a simple representation • Or we can increase the number of clusters (n fold increase adds log n bits per point, decreases error by sqrt(n)
  • 34.
    © 2014 MapRTechnologies 35 Diagonalized Cluster Proximity
  • 35.
    © 2014 MapRTechnologies 36 Lots of Clusters Are Fine
  • 36.
    © 2014 MapRTechnologies 37 Typical k-means Failure Selecting two seeds here cannot be fixed with Lloyds Result is that these two clusters get glued together
  • 37.
    © 2014 MapRTechnologies 38 Streaming k-means Ideas • By using a sketch with lots (k log N) of centroids, we avoid pathological cases • We still get a very good result if the sketch is created – in one pass – with approximate search • In fact, adaptive dp-means works just fine • In the end, the sketch can be used for clustering or …
  • 38.
    © 2014 MapRTechnologies 39 Lesson 3: Sketches make big data small.
  • 39.
    © 2014 MapRTechnologies 40 Example 4: Search Abuse
  • 40.
    © 2014 MapRTechnologies 41 Recommendation Alice got an apple and a puppyAlice Charles got a bicycleCharles Bob Bob got an apple
  • 41.
    © 2014 MapRTechnologies 42 Recommendation Alice got an apple and a puppyAlice Charles got a bicycleCharles Bob Bob got an apple. What else would Bob like?
  • 42.
    © 2014 MapRTechnologies 43 Recommendation Alice got an apple and a puppyAlice Charles got a bicycleCharles Bob A puppy!
  • 43.
    © 2014 MapRTechnologies 44 History Matrix: Users x Items Alice Bob Charles ✔ ✔ ✔ ✔ ✔ ✔ ✔
  • 44.
    © 2014 MapRTechnologies 45 Co-Occurrence Matrix: Items x Items - 1 2 1 1 1 1 2 1 0 0 0 0 Use LLR test to turn co- occurrence into indicators of interesting co-occurrence
  • 45.
    © 2014 MapRTechnologies 46 Indicator Matrix: Anomalous Co-Occurrence ✔ ✔
  • 46.
    © 2014 MapRTechnologies 47 Co-occurrence Binary Matrix 1 1not not 1
  • 47.
    © 2014 MapRTechnologies 48 Indicator Matrix: Anomalous Co-Occurrence ✔ ✔ Result: The marked row will be added to the indicator field in the item document…
  • 48.
    © 2014 MapRTechnologies 49 Indicator Matrix ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine. Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators.
  • 49.
    © 2014 MapRTechnologies 50 Internals of the Recommender Engine 50
  • 50.
    © 2014 MapRTechnologies 51 Internals of the Recommender Engine 51
  • 51.
    © 2014 MapRTechnologies 52 Looking Inside LucidWorks What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 52 Real-time recommendation query and results: Evaluation
  • 52.
    © 2014 MapRTechnologies 53 Real-life example
  • 53.
    © 2014 MapRTechnologies 54 Lesson 4: Recursive search abuse pays Search can implement recs Which can implement search
  • 54.
    © 2014 MapRTechnologies 55 Summary
  • 55.
    © 2014 MapRTechnologies 56
  • 56.
    © 2014 MapRTechnologies 57 Me, Us • Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG • MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR

Editor's Notes

  • #47 TED: consider using the word “interesting” instead of “anomalous”… people may think you are talking about anomaly detection…
  • #48 Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
  • #49 Only important co-occurrence is puppy follows apple
  • #50 *Take that row of matrix and combine with all the meta data we might have… *Important thing to get from the co-occurrence matrix is this indicator.. Cool thing: analogous to what a lot of recommendation engines do *This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators) Find the useful co-occurrence and get rid of the rest. Sparsify and get the anomalous co-occurrence
  • #51 Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
  • #52 *This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence. *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
  • #53 This is a diagnostics window in the LucidWorks Solr index (not the web interface a user would see). It’s a way for the developer to do a rough evaluation (laugh test) of the choices offered by the recommendation engine. In other words, do these indicator artists represented by their indicator Id make reasonable recommendations Note to trainer: artist 303 happens to be The Beatles. Is that a good match for Chuck Berry?