Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ted Dunning, Chief Application Architect, MapR at MLconf SF

2,754 views

Published on

Abstract: Near real-time Updates for Cooccurrence-based Recommenders

Most recommendation algorithms are inherently batch oriented and require all relevant history to be processed. In some contexts such as music, this does not cause significant problems because waiting a day or three before recommendations are available for new items doesn’t significantly change their impact. In other contexts, the value of items drops precipitously with time so that recommending day-old items has little value to users.

In this talk, I will describe how a large-scale multi-modal cooccurrence recommender can be extended to include near real-time updates. In addition, I will show how these real-time updates are compatible with delivery of recommendations via search engines.

Published in: Technology
  • Be the first to comment

Ted Dunning, Chief Application Architect, MapR at MLconf SF

  1. 1. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
  2. 2. © 2014 MapR Technologies 2 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout
  3. 3. © 2014 MapR Technologies 3 Agenda • Background – recommending with puppies and ponies • Speed tricks • Accuracy tricks • Moving to real-time
  4. 4. © 2014 MapR Technologies 4 Puppies and Ponies
  5. 5. © 2014 MapR Technologies 5 CooccurrenCcoeoAcncaulryrseisn ce Analysis
  6. 6. © 2014 MapR Technologies 6 How Often Do Items Co-occur How often do items co-occur? A ¢ A A
  7. 7. © 2014 MapR Technologies 7 Which Co-occurrences are Interesting? Which cooccurences are interesting? ¢ A A sparsify(A¢A) Each row of indicators becomes a field in a search engine document
  8. 8. © 2014 MapR Technologies 8 Recommendations Alice got an apple and Alice a puppy
  9. 9. © 2014 MapR Technologies 9 Recommendations Alice got an apple and Alice a puppy Charles Charles got a bicycle
  10. 10. © 2014 MapR Technologies 10 Recommendations Alice got an apple and Alice a puppy Bob Bob got an apple Charles Charles got a bicycle
  11. 11. © 2014 MapR Technologies 11 Recommendations Alice got an apple and Alice a puppy Bob What else would Bob like? Charles Charles got a bicycle
  12. 12. © 2014 MapR Technologies 12 Recommendations Alice got an apple and Alice a puppy Bob A puppy! Charles Charles got a bicycle
  13. 13. By the way, like me, Bob also wants a pony… © 2014 MapR Technologies 13
  14. 14. © 2014 MapR Technologies 14 Recommendations ? Alice Bob Amelia Charles What if everybody gets a pony? What else would you recommend for new user Amelia?
  15. 15. © 2014 MapR Technologies 15 Recommendations ? Alice Bob Amelia Charles If everybody gets a pony, it’s not a very good indicator of what to else predict...
  16. 16. © 2014 MapR Technologies 16 Problems with Raw Co-occurrence • Very popular items co-occur with everything or why it’s not very helpful to know that everybody wants a pony… – Examples: Welcome document; Elevator music • Very widespread occurrence is not interesting to generate indicators for recommendation – Unless you want to offer an item that is constantly desired, such as razor blades (or ponies) • What we want is anomalous co-occurrence – This is the source of interesting indicators of preference on which to base recommendation
  17. 17. Overview: Get Useful Indicators from Behaviors © 2014 MapR Technologies 17 1. Use log files to build history matrix of users x items – Remember: this history of interactions will be sparse compared to all potential combinations 2. Transform to a co-occurrence matrix of items x items 3. Look for useful indicators by identifying anomalous co-occurrences to make an indicator matrix – Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference – ItemSimilarityJob in Apache Mahout uses LLR
  18. 18. Which one is the anomalous co-occurrence? A not A B 1 0 not B 0 2 © 2014 MapR Technologies 18 A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000
  19. 19. Which one is the anomalous co-occurrence? A not A 0.90 1.95 B 1 0 not B 0 2 © 2014 MapR Technologies 19 A not A B 13 1000 not B 1000 100,000 A not A 4.52 14.3 B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)
  20. 20. Collection of Documents: Insert Meta-Data Search Engine © 2014 MapR Technologies 20 Item meta-data Ingest easily via NFS Document for “puppy” id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet ✔ indicators: (t1)
  21. 21. © 2014 MapR Technologies 22 Cooccurrence Mechanics • Cooccurrence is just a self-join for each user, i for each history item j1 in Ai* for each history item j2 in Ai* count pair (j1, j2) aij1 aij2 = A¢A å i
  22. 22. © 2014 MapR Technologies 23 Cross-occurrence Mechanics • Cross occurrence is just a self-join of adjoined matrices for each user, i for each history item j1 in Ai* for each history item j2 in Bi* count pair (j1, j2) å ab= A¢B [A | B]¢ [A | B] = ij1 ij2 i A¢A A¢B B¢A B¢B é ë ê ù û ú
  23. 23. © 2014 MapR Technologies 24 A word about scaling
  24. 24. © 2014 MapR Technologies 25 A few pragmatic tricks • Downsample all user histories to max length (interaction cut) – Can be random or most-recent (no apparent effect on accuracy) – Prolific users are often pathological anyway – Common limit is 300 items (no apparent effect on accuracy) • Downsample all items to limit max viewers (frequency limit) – Can be random or earliest (no apparent effect) – Ubiquitous items are uninformative – Common limit is 500 users (no apparent effect) Schelter, et al. Scalable similarity-based neighborhood methods with MapReduce. Proceedings of the sixth ACM conference on Recommender systems. 2012
  25. 25. © 2014 MapR Technologies 26 But note! • Number of pairs for a user history with ki distinct items is ≈ ki 2/2 • Average size of user history increases with increasing dataset – Average may grow more slowly than N (or not!) – Full cooccurrence cost grows strictly faster than N å 2 Î o(N) t μ ki i – i.e. it just doesn’t scale • Downsampling interactions places bounds on per user cost – Cooccurrence with interaction cut is scalable 2 ( ) < Nkmax å 2 , kÎ O(N) i t μ min kmax 2 i
  26. 26. 0 200 400 600 800 1000 © 2014 MapR Technologies 27 0 1 2 3 4 5 6 Benefit of down−sampling User Limit Pairs (x 109) Without down−sampling Track limit = 1000 500 200 Computed on 48,373,586 pair−wise tr iples from the million song dataset ● ●
  27. 27. Batch Scaling in Time Implies Scaling in Space © 2014 MapR Technologies 28 • Note: – With frequency limit sampling, max cooccurrence count is small (<1000) – With interaction cut, total number of non-zero pairs is relatively small – Entire cooccurrence matrix can be stored in memory in ~10-15 GB • Specifically: – With interaction cut, cooccurrence scales in size A¢A 0 ÎO(N) – Without interaction cut, cooccurrence does not scale size-wise A¢A 0 Îw(N)
  28. 28. © 2014 MapR Technologies 29 Impact of Interaction Cut Downsampling • Interaction cut allows batch cooccurrence analysis to be O(N) in time and space • This is intriguing – Amortized cost is low – Could this be extended to an on-line form? • Incremental matrix factorization is hard – Could cooccurrence be a key alternative? • Scaling matters most at scale – Cooccurrence is very accurate at large scale – Factorization shows benefits at smaller scales
  29. 29. © 2014 MapR Technologies 30 Online update
  30. 30. © 2014 MapR Technologies 31 Requirements for Online Algorithms • Each unit of input must require O(1) work – Theoretical bound • The constants have to be small enough on average – Pragmatic constraint • Total accumulated data must be small (enough) – Pragmatic constraint
  31. 31. Real-time recommendations using MapR data platform New User History © 2014 MapR Technologies 32 Log Files Search Technology Item Meta-Data via NFS via NFS Pre Post MapR Cluster Recommendations Web Tier Recommendations happen in real-time Batch co-occurrence Want this to be real-time
  32. 32. © 2014 MapR Technologies 33 Space Bound Implies Time Bound • Because user histories are pruned, only a limited number of value updates need be made with each new observation • This bound is just twice the interaction cut kmax – Which is a constant • Bounding the number of updates trivially bounds the time
  33. 33. Implications for Online Update (A+ eij )¢ (A+ eij ) - A¢A = e¢ijA+ A¢eij +d jj ú+d jj © 2014 MapR Technologies 34 = 0 Ai* 0 é ê ê ê ë ù ú ú ú û é + ê 0 ( A)¢ 0 i* ë ù û
  34. 34. © 2014 MapR Technologies 35 Ai* + Ai* ( )¢ + dii 0 £ 2kmax -1Î O(1) With interaction cut at kmax
  35. 35. © 2014 MapR Technologies 36 But Wait, It Gets Better • The new observation may be pruned – For users at the interaction cut, we can ignore updates – For items at the frequency cut, we can ignore updates – Ignoring updates only affects indicators, not recommendation query – At million song dataset size, half of all updates are pruned • On average ki is much less than the interaction cut – For million song dataset, average appears to grow with log of frequency limit, with little dependency on values of interaction cut > 200 • LLR cutoff avoids almost all updates to counting • Average grows slowly with frequency cut
  36. 36. 0 200 400 600 800 1000 © 2014 MapR Technologies 37 0 5 10 15 20 25 30 35 Interaction cut (kmax) kave Frequency cut = 1000 = 500 = 200
  37. 37. 0 200 400 600 800 1000 © 2014 MapR Technologies 38 0 5 10 15 20 25 30 35 Frequency cut kave
  38. 38. © 2014 MapR Technologies 39 Recap • Cooccurrence-based recommendations are simple – Deploying with a search engine is even better • Interaction cut and frequency cut are key to batch scalability • Similar effect occurs in online form of updates – Only dozens of updates per transaction needed – Data structure required is relatively small – Very, very few updates cause search engine updates • Fully online recommendation is very feasible, almost easy
  39. 39. available for free at http://www.mapr.com/practical-machine-learning © 2014 MapR Technologies 40 More Details Available available for free at http://www.mapr.com/practical-machine-learning
  40. 40. © 2014 MapR Technologies 41 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout Apache Drill http://incubator.apache.org/drill/ Twitter @ApacheDrill
  41. 41. © 2014 MapR Technologies 42 Q & A Engage with us! @mapr maprtech tdunning@mapr.com MapR maprtech mapr-technologies

×