Using Mahout and a Search Engine for Recommendation


Published on

I presented this talk at the Open World Forum in Paris in 2013. The ideas here are that you can do basic recommendations and extended forms of recommendation such as intelligent search or cross recommendation or multi-modal recommendation using Mahout's cooccurrence analysis together with a search engine.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email….
  • Note to trainers: the next series of slides start with a cartoon example just to set the pattern of how to find co-occurrence and use it to find indicators of what to recommend. Of course, real examples require a LOT of data of user-item interaction history to actually work, so this is just an analogy to get the idea across…
  • * A history of what everybody has done. Obviously this is just a cartoon because large numbers of users and interactions with items would be required to build a recommender* Next step will be to predict what a new user might like…
  • *Bob is the “new user” and getting apple is his history
  • *Here is where the recommendation engine needs to go to work…Note to trainer: you might see if audience calls out the answer before revealing next slide…
  • Now you see the idea of co-occurrence as a basis for recommendation…
  • *Now we have a new user, Amelia. Like everybody else, she gets a pony… what should the recommender offer her based on her history?
  • * Pony not interesting because it is so widespread that it does not differentiate a pattern
  • Note to trainer: This is the situation similar to that in which we started, with three users in our history. The difference is that now everybody got a pony. Bob has apple and pony but not a puppy…yet
  • *Binary matrix is stored sparsely
  • *Convert by MapReduce into a binary matrixNote to trainer: Whether consider apple to have occurred with self is open question
  • Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
  • Only important co-occurrence is puppy follows apple
  • *Take that row of matrix and combine with all the meta data we might have…*Important thing to get from the co-occurrence matrix is this indicator..Cool thing: analogous to what a lot of recommendation engines do*This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)Find the useful co-occurrence and get rid of the rest. Sparsify and get the anomalous co-occurrence
  • Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
  • *This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence. *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
  • Note to trainer: you could ask the class to consider which data is related… for example, the first 3 bullets of the query relate to meta data for the item, not to data produced by the recommendation algorithm. The last 3 bullets refer to data in the sample query related to data in the indicator field(s) that were produced by the Mahout recommendation engine.
  • Using Mahout and a Search Engine for Recommendation

    1. 1. 1©MapR Technologies 2013- Multi-input Recommendations
    2. 2. 2©MapR Technologies 2013- Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR
    3. 3. 3©MapR Technologies 2013- Topic For Today  What is recommendation?  What makes it different?  What is multi-model recommendation?  How can I build it using common household items?
    4. 4. 4©MapR Technologies 2013- Oh … Also This  Detailed break-down of a live machine learning system running with Mahout on MapR  With code examples
    5. 5. 5©MapR Technologies 2013- I may have to summarize
    6. 6. 6©MapR Technologies 2013- I may have to summarize just a bit
    7. 7. 7©MapR Technologies 2013- Part 1: 5 minutes of background
    8. 8. 8©MapR Technologies 2013- Part 2: 10 minutes: I want a pony
    9. 9. 9©MapR Technologies 2013- Part 3: Implementation Examples
    10. 10. 10©MapR Technologies 2013-
    11. 11. 11©MapR Technologies 2013- Part 1: 5 minutes of background
    12. 12. 12©MapR Technologies 2013- What Does Machine Learning Look Like?
    13. 13. 13©MapR Technologies 2013- Recommendations as Machine Learning  Recommendation: – Involves observation of interactions between people taking action (users) and items for input data to the recommender model – Goal is to suggest additional appropriate or desirable interactions – Applications include: movie, music or map-based restaurant choices; suggesting sale items for e-stores or via cash-register receipts
    14. 14. 14©MapR Technologies 2013-
    15. 15. 15©MapR Technologies 2013-
    16. 16. 16©MapR Technologies 2013- Part 2: How recommenders work (I still want a pony)
    17. 17. 17©MapR Technologies 2013- Recommendations Recap: Behavior of a crowd helps us understand what individuals will do
    18. 18. 18©MapR Technologies 2013- Recommendations Alice got an apple and a puppy Charles got a bicycle Alice Charles
    19. 19. 19©MapR Technologies 2013- Recommendations Alice got an apple and a puppy Charles got a bicycle Bob got an apple Alice Bob Charles
    20. 20. 20©MapR Technologies 2013- Recommendations What else would Bob like?? Alice Bob Charles
    21. 21. 21©MapR Technologies 2013- Recommendations A puppy, of course! Alice Bob Charles
    22. 22. 22©MapR Technologies 2013- You get the idea of how recommenders work… (By the way, like me, Bob also wants a pony)
    23. 23. 23©MapR Technologies 2013- Recommendations What if everybody gets a pony? ? Alice Bob Charles Amelia What else would you recommend for Amelia?
    24. 24. 24©MapR Technologies 2013- Recommendations ? Alice Bob Charles Amelia If everybody gets a pony, it’s not a very good indicator of what to else predict...
    25. 25. 25©MapR Technologies 2013- Problems with Raw Co-occurrence  Very popular items co-occur with everything (or why it’s not very helpful to know that everybody wants a pony…) – Examples: Welcome document; Elevator music  Very widespread occurrence is not interesting as a way to generate indicators – Unless you want to offer an item that is constantly desired, such as razor blades (or ponies)  What we want is anomalous co-occurrence – This is the source of interesting indicators of preference on which to base recommendation
    26. 26. 26©MapR Technologies 2013- Get Useful Indicators from Behaviors 1. Use log files to build history matrix of users x items – Remember: this history of interactions will be sparse compared to all potential combinations 2. Transform to a co-occurrence matrix of items x items 3. Look for useful co-occurrence by looking for anomalous co- occurrences to make an indicator matrix – Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference – RowSimilarityJob in Apache Mahout uses LLR
    27. 27. 27©MapR Technologies 2013- Log Files Alice Bob Charles Alice Bob Charles Alice
    28. 28. 28©MapR Technologies 2013- History Matrix: Users by Items Alice Bob Charles ✔ ✔ ✔ ✔ ✔ ✔ ✔
    29. 29. 29©MapR Technologies 2013- Co-occurrence Matrix: Items by Items - 1 2 1 1 1 1 2 1 How do you tell which co-occurrences are useful?. 0 0 0 0
    30. 30. 30©MapR Technologies 2013- Co-occurrence Binary Matrix 1 1not not 1
    31. 31. 31©MapR Technologies 2013- Indicator Matrix: Anomalous Co-Occurrence ✔ Result: The marked row will be added to the indicator field in the item document…
    32. 32. 32©MapR Technologies 2013- Indicator Matrix ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine. Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators.
    33. 33. 33©MapR Technologies 2013- Internals of the Recommender Engine 33
    34. 34. 34©MapR Technologies 2013- Internals of the Recommender Engine 34
    35. 35. 35©MapR Technologies 2013- A Quick Simplification  Users who do h (a vector of things a user has done)  Also do r Ah AT Ah( ) AT A( )h User-centric recommendations (transpose translates back to things) Item-centric recommendations (change the order of operations) A translates things into users
    36. 36. 36©MapR Technologies 2013- Nice. But we can do better?
    37. 37. 37©MapR Technologies 2013- Ausers things
    38. 38. 38©MapR Technologies 2013- A1 A2 é ë ù û users thing type 1 thing type 2
    39. 39. 39©MapR Technologies 2013- A1 A2 é ë ù û T A1 A2 é ë ù û= A1 T A2 T é ë ê ê ù û ú ú A1 A2 é ë ù û = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú r1 r2 é ë ê ê ù û ú ú = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú h1 h2 é ë ê ê ù û ú ú r1 = A1 T A1 A1 T A2 é ëê ù ûú h1 h2 é ë ê ê ù û ú ú
    40. 40. 40©MapR Technologies 2013- Now again, without the scary math
    41. 41. 41©MapR Technologies 2013- Part 3: Implementation Examples
    42. 42. 42©MapR Technologies 2013- For example  Users enter queries (A) – (actor = user, item=query)  Users view videos (B) – (actor = user, item=video)  ATA gives query recommendation – “did you mean to ask for”  BTB gives video recommendation – “you might like these videos”
    43. 43. 43©MapR Technologies 2013- The punch-line  BTA recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
    44. 44. 44©MapR Technologies 2013- Real-life example  Query: “Paco de Lucia”  Conventional meta-data search results: – “hombres del paco” times 400 – not much else  Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
    45. 45. 45©MapR Technologies 2013- Real-life example
    46. 46. 46©MapR Technologies 2013- Search-based Recommendations  Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40  Sample query – Current location – Recent merchant descriptions – Recent merchant id’s – Recent SIC codes – Recent accepted offers – Local top40 Original data and meta-data Derived from cooccurrence and cross-occurrence analysis Recommendation query
    47. 47. 47©MapR Technologies 2013- Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR