Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Recommendation Techn

1,395 views

Published on

  • WOW! Wish we had had this info 3 years ago! In just the last few hours our sibling boys have lowered the intensity and length of barking episodes by at least 50%!!! I can't wait to see the results a month from now!! ★★★ https://bit.ly/38b79Wm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Recommendation Techn

  1. 1. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
  2. 2. © 2014 MapR Technologies 2 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout 22 June 2014 Big Data Everywhere Conference #DataIsrael
  3. 3. Recommendation: Widely Used Machine Learning Example: Open source Apache Mahout used in production © 2014 MapR Technologies 3 http://www.wired.com/wiredenterprise/2012/12/mahout/
  4. 4. © 2014 MapR Technologies 4 Recommendations – Data: interactions between people taking action (users) and items • Data used to train recommendation model – Goal is to suggest additional interactions – Example applications: movie, music or map-based restaurant choices; suggesting sale items for e-stores or via cash-register receipts
  5. 5. Google maps: restaurant recommendations © 2014 MapR Technologies 5
  6. 6. © 2014 MapR Technologies 6 Google maps: tech recommendations
  7. 7. © 2014 MapR Technologies 7 Tutorial Part 1: How recommendation works, or “I want a pony”…
  8. 8. First question: Are you using the right data? © 2014 MapR Technologies 9
  9. 9. © 2014 MapR Technologies 10 Recommendation Behavior of a crowd helps us understand what individuals will do
  10. 10. © 2014 MapR Technologies 11 Recommendations Alice got an apple and Alice a puppy
  11. 11. © 2014 MapR Technologies 12 Recommendations Alice got an apple and Alice a puppy Charles Charles got a bicycle
  12. 12. © 2014 MapR Technologies 13 Recommendations Alice got an apple and Alice a puppy Bob Bob got an apple Charles Charles got a bicycle
  13. 13. © 2014 MapR Technologies 14 Recommendations Alice got an apple and Alice a puppy Bob What else would Bob like? Charles Charles got a bicycle
  14. 14. © 2014 MapR Technologies 15 Recommendations Alice got an apple and Alice a puppy Bob A puppy! Charles Charles got a bicycle
  15. 15. © 2014 MapR Technologies 16 You get the idea of how recommenders can work…
  16. 16. By the way, like me, Bob also wants a pony… © 2014 MapR Technologies 17
  17. 17. © 2014 MapR Technologies 18 Recommendations ? Alice Bob Amelia Charles What if everybody gets a pony? What else would you recommend for new user Amelia?
  18. 18. © 2014 MapR Technologies 19 Recommendations ? Alice Bob Amelia Charles If everybody gets a pony, it’s not a very good indicator of what to else predict...
  19. 19. © 2014 MapR Technologies 20 Problems with Raw Co-occurrence • Very popular items co-occur with everything or why it’s not very helpful to know that everybody wants a pony… – Examples: Welcome document; Elevator music • Very widespread occurrence is not interesting to generate indicators for recommendation – Unless you want to offer an item that is constantly desired, such as razor blades (or ponies) • What we want is anomalous co-occurrence – This is the source of interesting indicators of preference on which to base recommendation
  20. 20. Overview: Get Useful Indicators from Behaviors © 2014 MapR Technologies 21 1. Use log files to build history matrix of users x items – Remember: this history of interactions will be sparse compared to all potential combinations 2. Transform to a co-occurrence matrix of items x items 3. Look for useful indicators by identifying anomalous co-occurrences to make an indicator matrix – Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference – ItemSimilarityJob in Apache Mahout uses LLR
  21. 21. Apache Mahout: Overview • Open source Apache project http://mahout.apache.org/ • Mahout version is 0.9 released Feb 2014; inc Scala © 2014 MapR Technologies 22 – Summary 0.9 blog at http://bit.ly/1rirUUL • Library of scalable algorithms for machine learning – Some run on Apache Hadoop distributions; others do not require Hadoop – Some can be run at small scale – Some are run in parallel; others are sequential • Includes the following main areas: – Clustering & related techniques – Classification – Recommendation – Mahout Math Library
  22. 22. © 2014 MapR Technologies 24 Log Files Alice Charles Charles Alice Alice Bob Bob
  23. 23. © 2014 MapR Technologies 25 Log Files u1 u2 u2 u1 u1 u3 u3 t1 t4 t3 t2 t3 t3 t1
  24. 24. © 2014 MapR Technologies 26 History Matrix: Users x Items Alice Bob Charles ✔ ✔ ✔ ✔ ✔ ✔ ✔
  25. 25. © 2014 MapR Technologies 27 Co-Occurrence Matrix: Items x Items 1 2 0 1 1 1 1 1 0 2 0 0 How do you tell which co-occurrences are useful?
  26. 26. © 2014 MapR Technologies 28 Co-Occurrence Matrix: Items x Items 1 2 0 1 1 1 1 1 0 2 0 0 Use LLR test to turn co-occurrence into indicators…
  27. 27. © 2014 MapR Technologies 29 Co-occurrence Binary Matrix 1 not 1 not 1
  28. 28. Which one is the anomalous co-occurrence? A not A B 1 0 not B 0 2 © 2014 MapR Technologies 30 A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000
  29. 29. Which one is the anomalous co-occurrence? A not A 0.90 1.95 B 1 0 not B 0 2 © 2014 MapR Technologies 31 A not A B 13 1000 not B 1000 100,000 A not A 4.52 14.3 B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000
  30. 30. © 2014 MapR Technologies 32 Co-Occurrence Matrix: Items x Items 1 2 0 1 1 1 1 1 0 2 0 0 Recap: Use LLR test to turn co-occurrence into indicators
  31. 31. Indicator Matrix: Anomalous Co-Occurrence Result: Each marked row shows the indicators of what to recommend… © 2014 MapR Technologies 33 ✔ ✔
  32. 32. Indicator Matrix: Anomalous Co-Occurrence © 2014 MapR Technologies 34 ✔ ✔ Why not pony + other item?
  33. 33. © 2014 MapR Technologies 36 How will you deliver recommendations to users?
  34. 34. © 2014 MapR Technologies 37 Seeking Simplicity Innovation: Exploit search technology to deploy your recommendation system
  35. 35. © 2014 MapR Technologies 38 But first, a look at how search works…
  36. 36. Apache Solr/Apache Lucene • Apache Solr/Lucene is an open-source powerful search engine used for flexible, heavily indexed queries including data such as – Full text, geographical data, statistically weighted data © 2014 MapR Technologies 39 • Lucene – Provides core retrieval – Is a low-level library • Solr – Is a web-based wrapper around Lucene – Is easy to integrate because you talk to it via a web-interface • URL http://host machine:8888
  37. 37. © 2014 MapR Technologies 40 LucidWorks • Enterprise platform and collection of applications based on Apache Solr/Lucene – Wrapper around Solr – A free version ships with MapR • LucidWorks leaves Solr exposed but makes Solr administration much easier, which in turn makes it easier to use Lucene • URL http://host machine:8989
  38. 38. © 2014 MapR Technologies 41 LucidWorks Solr Query Response Index Lucene Query Response Data Source Relationship of Solr /Lucene/LucidWorks
  39. 39. © 2014 MapR Technologies 42 Other Options: Elastic Search • Apache Lucene library at the heart of several approaches – It can also be used on its own • Elastic Search is a different web interface for Lucene – Real time search and analytics – Open source (not Apache) – Big advantage is less accumulated cruft http://www.elasticsearch.com/
  40. 40. © 2014 MapR Technologies 44 What is a Document? • Data is stored in collections made up of documents • Documents contain fields that can be – Indexed • makes field searchable; don’t have to index all – Stored • If you want Solr to return content, have Solr store content • Not all data of interest must be stored: can access via stored URL (good for very large data set) – Multi-value • Body field can contain more than one type of data – Facetted • A way to refine a search or use for statistics • Example: data for country: Could return facetted as “37 from US, 23 UK, 7 Japan”
  41. 41. © 2014 MapR Technologies 45 Fields and How to Set Them Up • Lucene is mostly used for text – Text has to be tokenized • Also supports other types – Long, string, keywords, comma separated • Fields properties such as stored, indexed, faceted can also be defined • Defaults aren’t usually so great
  42. 42. © 2014 MapR Technologies 46 Example of Facetted Search The indexed fields “Area” and “Gender” have been facetted to provide counts for the results
  43. 43. Field-specific Searches Using Lucene Syntax • Documents often have title, author, keywords and body text • General syntax is © 2014 MapR Technologies 48 field:(term1 term2) • Alternatives field:(term1 OR term2) field:(“term1 term2” ~ 5) field:(term1 AND term2) • Default field, default interpretations work very well for text
  44. 44. © 2014 MapR Technologies 49 Send Data to Solr (not LucidWorks) • LucidWorks has lots of spiders that can use file extension or mime type to trigger certain file parsers – Works best with web-ish sources and mime types – Also includes MapR and MapR high volume indexers • More common at modest volumes or for updates to use JSON format {"id":"book_314", "title":{"set":"The Call of the Wild"}} • Use REST interface to send update files curl http://localhost:8983/solr/update -H 'Content-type:application/json' --data-binary @file.json
  45. 45. Back to recommendation: How do you abuse search to make recommendation easy? © 2014 MapR Technologies 51
  46. 46. Collection of Documents: Insert Meta-Data © 2014 MapR Technologies 52 Search Technology Item meta-data Ingest easily via NFS Document for “puppy” id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet
  47. 47. From Indicator Matrix to New Indicator Field © 2014 MapR Technologies 53 ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) Solr document for “puppy” Note: data for the indicator field is added directly to meta-data for a document in Apache Solr or Elastic Search index. You don’t need to create a separate index for the indicators.
  48. 48. Let’s look at a real example: we built a music recommender © 2014 MapR Technologies 54
  49. 49. © 2014 MapR Technologies 56
  50. 50. User activity: Listens to classic jazz hit “Take the A Train” © 2014 MapR Technologies 57
  51. 51. System delivers recommendations based on activity © 2014 MapR Technologies 58
  52. 52. © 2014 MapR Technologies 59 Let’s look inside the music recommender…
  53. 53. Music Meta Data for Search Document Collections © 2014 MapR Technologies 60 • MusicBrainz data • Data includes Artist ID, MusicBrainz ID, Name, Group/Person, From (geo locations) and Gender as seen in this sample
  54. 54. Sample User Behavior Histories: Music Log Files © 2014 MapR Technologies 62 13 START 10113 2182654281 23 BEACON 10113 2182654281 24 START 10113 79600611935028 34 BEACON 10113 79600611935028 44 BEACON 10113 79600611935028 54 BEACON 10113 79600611935028 64 BEACON 10113 79600611935028 74 BEACON 10113 79600611935028 84 BEACON 10113 79600611935028 94 BEACON 10113 79600611935028 104 BEACON 10113 79600611935028 109 FINISH10113 79600611935028 111 START 10113 58999912011972 121 BEACON 10113 58999912011972 Time Event type User ID Artist ID Track ID
  55. 55. © 2014 MapR Technologies 63 Sample Music Log Files Artist ID for jazz musician Duke Ellington What has user 119 done here in the highlighted lines?
  56. 56. © 2014 MapR Technologies 64 Internals of a Recommendation Engine
  57. 57. © 2014 MapR Technologies 65 Internals of a Recommendation Engine
  58. 58. Lucene Documents for Music Recommendation Notice that data from indicator matrix of trained Mahout recommender model has been added to indicator field in documents of the artists collection © 2014 MapR Technologies 66 id 1710 mbid 592a3b6d-c42b-4567-99c9-ecf63bd66499 name Chuck Berry area United States gender Male indicator_artists 386685,875994,637954,3418,1344,789739,1460, … id 541902 mbid 983d4f8f-473e-4091-8394-415c105c4656 name Charlie Winston area United Kingdom gender None indicator_artists 997727,815,830794,59588,900,2591,1344,696268, …
  59. 59. © 2014 MapR Technologies 68 Offline Analysis Analysis Using Mahout Item Meta-Data Users History Search Technology Log Files Indicators
  60. 60. Real-time recommendations using MapR data platform © 2014 MapR Technologies 69 Log Files New User History via NFS Python Mahout Analysis Search Technology Item Meta-Data Ingest easily via NFS MapR Cluster Use Python directly via NFS Pig Web Recommendations Tier
  61. 61. © 2014 MapR Technologies 70 A Quick Simplification • Users who do h • Also do Ah AT Ah ( ) (ATA)h User-centric recommendations Item-centric recommendations
  62. 62. © 2014 MapR Technologies 71 Architectural Advantage AT (Ah) AT( A)h User-centric recommendations Item-centric recommendations
  63. 63. © 2014 MapR Technologies 72 Architectural Advantage AT Ah ( ) User-centric recommendations With the first design, you have to do the real-time computation first (in parenthesis). No way to pre-compute. Less efficient, less fast. (ATA)h Item-centric recommendations With the second design, you can pre-compute offline (overnight) things that change slowly. Only the smaller computation for new user vector (h) is done in real-time, so response is very fast.
  64. 64. © 2014 MapR Technologies 74 Tutorial Part 2: How to make recommendation better
  65. 65. Going Further: Multi-Modal Recommendation © 2014 MapR Technologies 75
  66. 66. Going Further: Multi-Modal Recommendation © 2014 MapR Technologies 76
  67. 67. © 2014 MapR Technologies 77 For example • Users enter queries (A) – (actor = user, item=query) • Users view videos (B) – (actor = user, item=video) • ATA gives query recommendation – “did you mean to ask for” • BTB gives video recommendation – “you might like these videos”
  68. 68. © 2014 MapR Technologies 78 The punch-line • BTA recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
  69. 69. © 2014 MapR Technologies 79 Real-life example • Query: “Paco de Lucia” • Conventional meta-data search results: – “hombres de paco” times 400 – not much else • Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  70. 70. © 2014 MapR Technologies 80 Real-life example
  71. 71. © 2014 MapR Technologies 81 Hypothetical Example • Want a navigational ontology? • Just put labels on a web page with traffic – This gives A = users x label clicks • Remember viewing history – This gives B = users x items • Cross recommend – B’A = label to item mapping • After several users click, results are whatever users think they should be
  72. 72. © 2014 MapR Technologies 82 Nice. But we can do better?
  73. 73. © 2014 MapR Technologies 84 Symmetry Gives Cross Recommentations (ATA)h BT( A)h Conventional recommendations with off-line learning Cross recommendations
  74. 74. © 2014 MapR Technologies 85 things A users
  75. 75. © 2014 MapR Technologies 86 A1 A2 é ë ù û users thing type 1 thing type 2
  76. 76. © 2014 MapR Technologies 87 A1 A2 éë T ù û A1 A2 éë ùû = T A1 T A2 é ê ê ë ù ú ú û A1 A2 éë ùû = TA1 A1 A1 TA2 AT 2A1 AT 2A2 é ê ê ë ù ú ú û r1 r2 é ê ê ë ù ú ú û = TA1 A1 A1 TA2 AT 2A1 AT 2A2 é ê ê ë ù ú ú ê ê û h1 h2 é ë ù ú ú û TA1 A1 r1 = A1 TA2 é ëê ù ê ê ûú h1 h2 é ë ù ú ú û
  77. 77. © 2014 MapR Technologies 88 Bonus Round: When worse is better
  78. 78. © 2014 MapR Technologies 89 The Real Issues After First Production • Exploration • Diversity • Speed • Not the last fraction of a percent
  79. 79. © 2014 MapR Technologies 90 Result Dithering • Dithering is used to re-order recommendation results – Re-ordering is done randomly • Dithering is guaranteed to make off-line performance worse • Dithering also has a near perfect record of making actual performance much better
  80. 80. © 2014 MapR Technologies 91 Result Dithering • Dithering is used to re-order recommendation results – Re-ordering is done randomly • Dithering is guaranteed to make off-line performance worse • Dithering also has a near perfect record of making actual performance much better “Made more difference than any other change”
  81. 81. © 2014 MapR Technologies 92 Why Dithering Works Real-time recommender Overnight training Log Files
  82. 82. © 2014 MapR Technologies 93 Why Use Dithering?
  83. 83. © 2014 MapR Technologies 94 Simple Dithering Algorithm • Synthetic score from log rank plus Gaussian s = logr + N(0, loge ) • Pick noise scale to provide desired level of mixing • Typically Dr r μ e e Î [1.5,3] • Also… use floor(t/T) as seed
  84. 84. © 2014 MapR Technologies 95 Example … ε = 2 1 2 8 3 9 15 7 6 1 8 14 15 3 2 22 10 1 3 8 2 10 5 7 4 1 2 10 7 3 8 6 14 1 5 33 15 2 9 11 29 1 2 7 3 5 4 19 6 1 3 5 23 9 7 4 2 2 4 11 8 3 1 44 9 2 3 1 4 6 7 8 33 3 4 1 2 10 11 15 14 11 1 2 4 5 7 3 14 1 8 7 3 22 11 2 33
  85. 85. © 2014 MapR Technologies 96 Lesson: Exploration is good
  86. 86. © 2014 MapR Technologies 97 Thank you Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout 22 June 2014 Big Data Everywhere Conference #DataIsrael

×