© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
© 2014 MapR Technologies 3
Agenda
• Background – recommending with puppies and ponies
• Speed tricks
• Accuracy tricks
• Moving to real-time
© 2014 MapR Technologies 4
Puppies and Ponies
© 2014 MapR Technologies 5
Cooccurrence AnalysisCooccurrence Analysis
© 2014 MapR Technologies 6
How Often Do Items Co-occur
How often do items co-occur?
¢A A
A
© 2014 MapR Technologies 7
Which Co-occurrences are Interesting?
Which cooccurences are interesting?
Each row of indicators becomes a field in a
search engine document
¢A A
sparsify ¢A A( )
© 2014 MapR Technologies 8
Recommendations
Alice got an apple and
a puppyAlice
© 2014 MapR Technologies 9
Recommendations
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
© 2014 MapR Technologies 10
Recommendations
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
Bob Bob got an apple
© 2014 MapR Technologies 11
Recommendations
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
Bob What else would Bob like?
© 2014 MapR Technologies 12
Recommendations
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
Bob A puppy!
© 2014 MapR Technologies 13
By the way, like me, Bob also
wants a pony…
© 2014 MapR Technologies 14
Recommendations
?
Alice
Bob
Charles
Amelia
What if everybody gets a
pony?
What else would you recommend for
new user Amelia?
© 2014 MapR Technologies 15
Recommendations
?
Alice
Bob
Charles
Amelia
If everybody gets a pony, it’s
not a very good indicator of
what to else predict...
© 2014 MapR Technologies 16
Problems with Raw Co-occurrence
• Very popular items co-occur with everything or why it’s not very
helpful to know that everybody wants a pony…
– Examples: Welcome document; Elevator music
• Very widespread occurrence is not interesting to generate indicators
for recommendation
– Unless you want to offer an item that is constantly desired, such as
razor blades (or ponies)
• What we want is anomalous co-occurrence
– This is the source of interesting indicators of preference on which to
base recommendation
© 2014 MapR Technologies 17
Overview: Get Useful Indicators from Behaviors
1. Use log files to build history matrix of users x items
– Remember: this history of interactions will be sparse compared to all
potential combinations
2. Transform to a co-occurrence matrix of items x items
3. Look for useful indicators by identifying anomalous co-occurrences to
make an indicator matrix
– Log Likelihood Ratio (LLR) can be helpful to judge which co-
occurrences can with confidence be used as indicators of preference
– ItemSimilarityJob in Apache Mahout uses LLR
© 2014 MapR Technologies 18
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2
© 2014 MapR Technologies 19
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2
0.90 1.95
4.52 14.3
Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence,
Computational Linguistics vol 19 no. 1 (1993)
© 2014 MapR Technologies 20
Collection of Documents: Insert Meta-Data
Search Engine
Item
meta-data
Document for
“puppy”
id: t4
title: puppy
desc: The sweetest little puppy
ever.
keywords: puppy, dog, pet
Ingest easily via NFS
✔ indicators: (t1)
© 2014 MapR Technologies 22
Cooccurrence Mechanics
• Cooccurrence is just a self-join
for each user, i
for each history item j1 in Ai*
for each history item j2 in Ai*
count pair (j1, j2)
aij1
aij2
= ¢A A
i
å
© 2014 MapR Technologies 23
Cross-occurrence Mechanics
• Cross occurrence is just a self-join of adjoined matrices
for each user, i
for each history item j1 in Ai*
for each history item j2 in Bi*
count pair (j1, j2)
aij1
bij2
= ¢A B
i
å A | B[ ]¢ A | B[ ] =
¢A A ¢A B
¢B A ¢B B
é
ë
ê
ù
û
ú
© 2014 MapR Technologies 24
A word about scaling
© 2014 MapR Technologies 25
A few pragmatic tricks
• Downsample all user histories to max length (interaction cut)
– Can be random or most-recent (no apparent effect on accuracy)
– Prolific users are often pathological anyway
– Common limit is 300 items (no apparent effect on accuracy)
• Downsample all items to limit max viewers (frequency limit)
– Can be random or earliest (no apparent effect)
– Ubiquitous items are uninformative
– Common limit is 500 users (no apparent effect)
Schelter, et al. Scalable similarity-based neighborhood methods with MapReduce.
Proceedings of the sixth ACM conference on Recommender systems. 2012
© 2014 MapR Technologies 26
But note!
• Number of pairs for a user history with ki distinct items is ≈ ki
2/2
• Average size of user history increases with increasing dataset
– Average may grow more slowly than N (or not!)
– Full cooccurrence cost grows strictly faster than N
– i.e. it just doesn’t scale
• Downsampling interactions places bounds on per user cost
– Cooccurrence with interaction cut is scalable
t µ ki
2
i
å Î o(N)
t µ min kmax
2
,ki
2
( )< Nkmax
2
i
å Î O(N)
© 2014 MapR Technologies 27
0 200 400 600 800 1000
0123456
Benefit of down−sampling
User Limit
Pairs(x109
)
Without down−sampling
Track limit = 1000
500
200
Computed on 48,373,586 pair−wise triples
from the million song dataset
●
●
© 2014 MapR Technologies 28
Batch Scaling in Time Implies Scaling in Space
• Note:
– With frequency limit sampling, max cooccurrence count is small (<1000)
– With interaction cut, total number of non-zero pairs is relatively small
– Entire cooccurrence matrix can be stored in memory in ~10-15 GB
• Specifically:
– With interaction cut, cooccurrence scales in size
– Without interaction cut, cooccurrence does not scale size-wise
¢A A 0
Î O(N)
¢A A 0
Îw(N)
© 2014 MapR Technologies 29
Impact of Interaction Cut Downsampling
• Interaction cut allows batch cooccurrence analysis to be O(N) in
time and space
• This is intriguing
– Amortized cost is low
– Could this be extended to an on-line form?
• Incremental matrix factorization is hard
– Could cooccurrence be a key alternative?
• Scaling matters most at scale
– Cooccurrence is very accurate at large scale
– Factorization shows benefits at smaller scales
© 2014 MapR Technologies 30
Online update
© 2014 MapR Technologies 31
Requirements for Online Algorithms
• Each unit of input must require O(1) work
– Theoretical bound
• The constants have to be small enough on average
– Pragmatic constraint
• Total accumulated data must be small (enough)
– Pragmatic constraint
© 2014 MapR Technologies 32
Log Files
Search
Technology
Item
Meta-Data
via
NFS
MapR Cluster
via
NFS PostPre
Recommendations
New User
History
Web
Tier
Recommendations
happen in real-time
Batch co-
occurrence
Want this to be real-time
Real-time recommendations using MapR data platform
© 2014 MapR Technologies 33
Space Bound Implies Time Bound
• Because user histories are pruned, only a limited number of
value updates need be made with each new observation
• This bound is just twice the interaction cut kmax
– Which is a constant
• Bounding the number of updates trivially bounds the time
© 2014 MapR Technologies 34
Implications for Online Update
A+eij( )¢ A+eij( )- ¢A A = ¢eij A+ ¢A eij +djj
=
0
Ai*
0
é
ë
ê
ê
ê
ù
û
ú
ú
ú
+ 0 Ai*( )¢ 0
é
ë
ê
ù
û
ú+djj
© 2014 MapR Technologies 35
Ai* + Ai*( )¢ +dii
0
£ 2kmax -1Î O(1)
With interaction cut at kmax
© 2014 MapR Technologies 36
But Wait, It Gets Better
• The new observation may be pruned
– For users at the interaction cut, we can ignore updates
– For items at the frequency cut, we can ignore updates
– Ignoring updates only affects indicators, not recommendation query
– At million song dataset size, half of all updates are pruned
• On average ki is much less than the interaction cut
– For million song dataset, average appears to grow with log of frequency
limit, with little dependency on values of interaction cut > 200
• LLR cutoff avoids almost all updates to index
• Average grows slowly with frequency cut
© 2014 MapR Technologies 37
0 200 400 600 800 1000
05101520253035
Interaction cut (kmax)
kave
Frequency cut = 1000
= 500
= 200
© 2014 MapR Technologies 38
0 200 400 600 800 1000
05101520253035
Frequency cut
kave
© 2014 MapR Technologies 39
Recap
• Cooccurrence-based recommendations are simple
– Deploying with a search engine is even better
• Interaction cut and frequency cut are key to batch scalability
• Similar effect occurs in online form of updates
– Only dozens of updates per transaction needed
– Data structure required is relatively small
– Very, very few updates cause search engine updates
• Fully online recommendation is very feasible, almost easy
© 2014 MapR Technologies 40
More Details Available
available for free at
available for free at
http://www.mapr.com/practical-machine-learning
© 2014 MapR Technologies 41
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
Apache Drill http://incubator.apache.org/drill/
Twitter @ApacheDrill
© 2014 MapR Technologies 42
Q&A
@mapr maprtech
tdunning@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

Dunning ml-conf-2014

  • 1.
    © 2014 MapRTechnologies 1© 2014 MapR Technologies
  • 2.
    © 2014 MapRTechnologies 2 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout
  • 3.
    © 2014 MapRTechnologies 3 Agenda • Background – recommending with puppies and ponies • Speed tricks • Accuracy tricks • Moving to real-time
  • 4.
    © 2014 MapRTechnologies 4 Puppies and Ponies
  • 5.
    © 2014 MapRTechnologies 5 Cooccurrence AnalysisCooccurrence Analysis
  • 6.
    © 2014 MapRTechnologies 6 How Often Do Items Co-occur How often do items co-occur? ¢A A A
  • 7.
    © 2014 MapRTechnologies 7 Which Co-occurrences are Interesting? Which cooccurences are interesting? Each row of indicators becomes a field in a search engine document ¢A A sparsify ¢A A( )
  • 8.
    © 2014 MapRTechnologies 8 Recommendations Alice got an apple and a puppyAlice
  • 9.
    © 2014 MapRTechnologies 9 Recommendations Alice got an apple and a puppyAlice Charles got a bicycleCharles
  • 10.
    © 2014 MapRTechnologies 10 Recommendations Alice got an apple and a puppyAlice Charles got a bicycleCharles Bob Bob got an apple
  • 11.
    © 2014 MapRTechnologies 11 Recommendations Alice got an apple and a puppyAlice Charles got a bicycleCharles Bob What else would Bob like?
  • 12.
    © 2014 MapRTechnologies 12 Recommendations Alice got an apple and a puppyAlice Charles got a bicycleCharles Bob A puppy!
  • 13.
    © 2014 MapRTechnologies 13 By the way, like me, Bob also wants a pony…
  • 14.
    © 2014 MapRTechnologies 14 Recommendations ? Alice Bob Charles Amelia What if everybody gets a pony? What else would you recommend for new user Amelia?
  • 15.
    © 2014 MapRTechnologies 15 Recommendations ? Alice Bob Charles Amelia If everybody gets a pony, it’s not a very good indicator of what to else predict...
  • 16.
    © 2014 MapRTechnologies 16 Problems with Raw Co-occurrence • Very popular items co-occur with everything or why it’s not very helpful to know that everybody wants a pony… – Examples: Welcome document; Elevator music • Very widespread occurrence is not interesting to generate indicators for recommendation – Unless you want to offer an item that is constantly desired, such as razor blades (or ponies) • What we want is anomalous co-occurrence – This is the source of interesting indicators of preference on which to base recommendation
  • 17.
    © 2014 MapRTechnologies 17 Overview: Get Useful Indicators from Behaviors 1. Use log files to build history matrix of users x items – Remember: this history of interactions will be sparse compared to all potential combinations 2. Transform to a co-occurrence matrix of items x items 3. Look for useful indicators by identifying anomalous co-occurrences to make an indicator matrix – Log Likelihood Ratio (LLR) can be helpful to judge which co- occurrences can with confidence be used as indicators of preference – ItemSimilarityJob in Apache Mahout uses LLR
  • 18.
    © 2014 MapRTechnologies 18 Which one is the anomalous co-occurrence? A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 A not A B 1 0 not B 0 2
  • 19.
    © 2014 MapRTechnologies 19 Which one is the anomalous co-occurrence? A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 A not A B 1 0 not B 0 2 0.90 1.95 4.52 14.3 Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)
  • 20.
    © 2014 MapRTechnologies 20 Collection of Documents: Insert Meta-Data Search Engine Item meta-data Document for “puppy” id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet Ingest easily via NFS ✔ indicators: (t1)
  • 21.
    © 2014 MapRTechnologies 22 Cooccurrence Mechanics • Cooccurrence is just a self-join for each user, i for each history item j1 in Ai* for each history item j2 in Ai* count pair (j1, j2) aij1 aij2 = ¢A A i å
  • 22.
    © 2014 MapRTechnologies 23 Cross-occurrence Mechanics • Cross occurrence is just a self-join of adjoined matrices for each user, i for each history item j1 in Ai* for each history item j2 in Bi* count pair (j1, j2) aij1 bij2 = ¢A B i å A | B[ ]¢ A | B[ ] = ¢A A ¢A B ¢B A ¢B B é ë ê ù û ú
  • 23.
    © 2014 MapRTechnologies 24 A word about scaling
  • 24.
    © 2014 MapRTechnologies 25 A few pragmatic tricks • Downsample all user histories to max length (interaction cut) – Can be random or most-recent (no apparent effect on accuracy) – Prolific users are often pathological anyway – Common limit is 300 items (no apparent effect on accuracy) • Downsample all items to limit max viewers (frequency limit) – Can be random or earliest (no apparent effect) – Ubiquitous items are uninformative – Common limit is 500 users (no apparent effect) Schelter, et al. Scalable similarity-based neighborhood methods with MapReduce. Proceedings of the sixth ACM conference on Recommender systems. 2012
  • 25.
    © 2014 MapRTechnologies 26 But note! • Number of pairs for a user history with ki distinct items is ≈ ki 2/2 • Average size of user history increases with increasing dataset – Average may grow more slowly than N (or not!) – Full cooccurrence cost grows strictly faster than N – i.e. it just doesn’t scale • Downsampling interactions places bounds on per user cost – Cooccurrence with interaction cut is scalable t µ ki 2 i å Î o(N) t µ min kmax 2 ,ki 2 ( )< Nkmax 2 i å Î O(N)
  • 26.
    © 2014 MapRTechnologies 27 0 200 400 600 800 1000 0123456 Benefit of down−sampling User Limit Pairs(x109 ) Without down−sampling Track limit = 1000 500 200 Computed on 48,373,586 pair−wise triples from the million song dataset ● ●
  • 27.
    © 2014 MapRTechnologies 28 Batch Scaling in Time Implies Scaling in Space • Note: – With frequency limit sampling, max cooccurrence count is small (<1000) – With interaction cut, total number of non-zero pairs is relatively small – Entire cooccurrence matrix can be stored in memory in ~10-15 GB • Specifically: – With interaction cut, cooccurrence scales in size – Without interaction cut, cooccurrence does not scale size-wise ¢A A 0 Î O(N) ¢A A 0 Îw(N)
  • 28.
    © 2014 MapRTechnologies 29 Impact of Interaction Cut Downsampling • Interaction cut allows batch cooccurrence analysis to be O(N) in time and space • This is intriguing – Amortized cost is low – Could this be extended to an on-line form? • Incremental matrix factorization is hard – Could cooccurrence be a key alternative? • Scaling matters most at scale – Cooccurrence is very accurate at large scale – Factorization shows benefits at smaller scales
  • 29.
    © 2014 MapRTechnologies 30 Online update
  • 30.
    © 2014 MapRTechnologies 31 Requirements for Online Algorithms • Each unit of input must require O(1) work – Theoretical bound • The constants have to be small enough on average – Pragmatic constraint • Total accumulated data must be small (enough) – Pragmatic constraint
  • 31.
    © 2014 MapRTechnologies 32 Log Files Search Technology Item Meta-Data via NFS MapR Cluster via NFS PostPre Recommendations New User History Web Tier Recommendations happen in real-time Batch co- occurrence Want this to be real-time Real-time recommendations using MapR data platform
  • 32.
    © 2014 MapRTechnologies 33 Space Bound Implies Time Bound • Because user histories are pruned, only a limited number of value updates need be made with each new observation • This bound is just twice the interaction cut kmax – Which is a constant • Bounding the number of updates trivially bounds the time
  • 33.
    © 2014 MapRTechnologies 34 Implications for Online Update A+eij( )¢ A+eij( )- ¢A A = ¢eij A+ ¢A eij +djj = 0 Ai* 0 é ë ê ê ê ù û ú ú ú + 0 Ai*( )¢ 0 é ë ê ù û ú+djj
  • 34.
    © 2014 MapRTechnologies 35 Ai* + Ai*( )¢ +dii 0 £ 2kmax -1Î O(1) With interaction cut at kmax
  • 35.
    © 2014 MapRTechnologies 36 But Wait, It Gets Better • The new observation may be pruned – For users at the interaction cut, we can ignore updates – For items at the frequency cut, we can ignore updates – Ignoring updates only affects indicators, not recommendation query – At million song dataset size, half of all updates are pruned • On average ki is much less than the interaction cut – For million song dataset, average appears to grow with log of frequency limit, with little dependency on values of interaction cut > 200 • LLR cutoff avoids almost all updates to index • Average grows slowly with frequency cut
  • 36.
    © 2014 MapRTechnologies 37 0 200 400 600 800 1000 05101520253035 Interaction cut (kmax) kave Frequency cut = 1000 = 500 = 200
  • 37.
    © 2014 MapRTechnologies 38 0 200 400 600 800 1000 05101520253035 Frequency cut kave
  • 38.
    © 2014 MapRTechnologies 39 Recap • Cooccurrence-based recommendations are simple – Deploying with a search engine is even better • Interaction cut and frequency cut are key to batch scalability • Similar effect occurs in online form of updates – Only dozens of updates per transaction needed – Data structure required is relatively small – Very, very few updates cause search engine updates • Fully online recommendation is very feasible, almost easy
  • 39.
    © 2014 MapRTechnologies 40 More Details Available available for free at available for free at http://www.mapr.com/practical-machine-learning
  • 40.
    © 2014 MapRTechnologies 41 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout Apache Drill http://incubator.apache.org/drill/ Twitter @ApacheDrill
  • 41.
    © 2014 MapRTechnologies 42 Q&A @mapr maprtech tdunning@mapr.com Engage with us! MapR maprtech mapr-technologies

Editor's Notes

  • #18 Mention that the Pony book said “RowSimilarityJob”…