How to tell which algorithms really matter

How to Tell Which Algorithms Really
Matter
Ted Dunning
MapR Technologies

© 2014 MapR Technologies 3
00:011.65TB
WITH 298 SERVERS

129K
RECCOMENDATIONS
00:02

Advertising
Automation
Cloud
Sellers
Cloud
Buyers
Cloud
63M
AD AUCTIONS
00:03

00:04422.2K
GENETIC SEQUENCES

Largest Biometric
Database
00:054.73M
AUTHENTICATIONS

© 2014 MapR Technologies 8© 2014 MapR Technologies
But How is This Done?
What really matters?

Topic For Today
• What is important? What is not?
• Why?
• What is the difference from academic research?
• Some examples

What is Important?
• Deployable
• Robust
• Transparent
• Skillset and mindset matched?
• Proportionate

What is Important?
• Deployable
– Clever prototypes don’t count if they can’t be standardized
• Robust
• Transparent
• Proportionate

What is Important?
• Deployable
– Clever prototypes don’t count
• Robust
– Mishandling is common
• Transparent
– Will degradation be obvious?
• Proportionate

What is Important?
• Deployable
– Clever prototypes don’t count
• Robust
– Mishandling is common
• Transparent
– Will degradation be obvious?
– How long will your fancy data scientist enjoy doing standard ops tasks?
• Proportionate
– Where is the highest value per minute of effort?

Academic Goals vs Pragmatics
• Academic goals
– Reproducible
– Isolate theoretically important aspects
– Work on novel problems
• Pragmatics
– Highest net value
– Available data is constantly changing
– Diligence and consistency have larger impact than cleverness
– Many systems feed themselves, exploration and exploitation are both
important
– Engineering constraints on budget and schedule

Example 1:
Making Recommendations Better

Recommendation Advances
• What are the most important algorithmic advances in
recommendations over the last 10 years?
• Cooccurrence analysis?
• Matrix completion via factorization?
• Latent factor log-linear models?
• Temporal dynamics?

The Winner – None of the Above
• What are the most important algorithmic advances in
recommendations over the last 10 years?
1. Result dithering (random noise)
2. Anti-flood (don’t repeat yourself)

The Real Issues
• Exploration
• Diversity
• Speed
• Not the last fraction of a percent

Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual
performance much better

Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual
performance much better
“Made more difference than any other change”

Example … ε = 0.5
1 2 6 5 3 4 13 16
1 2 3 8 5 7 6 34
1 4 3 2 6 7 11 10
1 2 4 3 15 7 13 19
1 6 2 3 4 16 9 5
1 2 3 5 24 7 17 13
1 2 3 4 6 12 5 14
2 1 3 5 7 6 4 17
4 1 2 7 3 9 8 5
2 1 5 3 4 7 13 6
3 1 5 4 2 7 8 6
2 1 3 4 7 12 17 16

Example … ε = log 2 = 0.69
1 2 8 3 9 15 7 6
1 8 14 15 3 2 22 10
1 3 8 2 10 5 7 4
1 2 10 7 3 8 6 14
1 5 33 15 2 9 11 29
1 2 7 3 5 4 19 6
1 3 5 23 9 7 4 2
2 4 11 8 3 1 44 9
2 3 1 4 6 7 8 33
3 4 1 2 10 11 15 14
11 1 2 4 5 7 3 14
1 8 7 3 22 11 2 33

Exploring The Second Page

Lesson 1:
Exploration is good

Example 2:
Bayesian Bandits

Bayesian Bandits
• Based on Thompson sampling
• Very general sequential test
• Near optimal regret
• Trade-off exploration and exploitation
• Possibly best known solution for exploration/exploitation
• Incredibly simple

Fast Convergence
11000 100 200 300 400 500 600 700 800 900 1000
0.12
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
n
regret
ε- greedy, ε = 0.05
Bayesian Bandit with Gamma- Normal

Thompson Sampling on Ads
An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011

Bayesian Bandits versus Result Dithering
• Many useful systems are difficult to frame in fully Bayesian form
• Thompson sampling cannot be applied without posterior
sampling
• Can still do useful exploration with dithering
• But better to use Thompson sampling if possible

Lesson 2:
Exploration is easy to do and
pays big benefits.

Example 3:
On-line Clustering

The Problem
• K-means clustering is useful for feature extraction or
compression
• At scale and at high dimension, the desirable number of clusters
increases
• Very large number of clusters may require more passes through
the data
• Super-linear scaling is generally infeasible

The Solution
• Sketch-based algorithms produce a sketch of the data
• Streaming k-means uses adaptive dp-means to produce this
sketch in the form of many weighted centroids which
approximate the original distribution
• The size of the sketch grows very slowly with increasing data
size
• Many operations such as clustering are well behaved on
sketches
Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson.
Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan.

An Example

Streaming k-means Ideas
• By using a sketch with lots (k log N) of centroids, we avoid
pathological cases
• We still get a very good result if the sketch is created
– in one pass
– with approximate search
• In fact, adaptive dp-means works just fine
• In the end, the sketch can be used for clustering or …

Lesson 3:
Sketches make big data small.

Example 4:
Search Abuse

Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Alice
Charles

Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles

Recommendations
What else would Bob like??
Alice
Bob
Charles

Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice

History Matrix: Users by Items
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔

Co-occurrence Matrix: Items by Items
-
1 2
1 1
1
1
2 1
How do you tell which co-occurrences are useful?.
0
0
0 0

Indicator Matrix: Anomalous Co-Occurrence
✔
✔
Result: The marked row will be added to the indicator field in the
item document…

Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy
ever.
keywords: puppy, dog, pet
indicators: (t1)
That one row from indicator matrix becomes the indicator field in the
Solr document used to deploy the recommendation engine.
Note: data for the indicator field is added directly to meta-data for a document in Solr
index. You don’t need to create a separate index for the indicators.

Internals of the Recommender Engine
56

Real-life example

Lesson 4:
Recursive search abuse pays
Search can implement recs
Which can implement search

How Does This Apply?

How Can I Start?

Q&A
@ted_dunning @mapr maprtech
tdunning@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

How to tell which algorithms really matter

How to tell which algorithms really matter

More Related Content

What's hot

Viewers also liked

Similar to How to tell which algorithms really matter

More from DataWorks Summit

Recently uploaded

How to tell which algorithms really matter

Editor's Notes