2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

  • 655 views
Uploaded on

This session will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and …

This session will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system is built using Mahout to do off-line analysis and Solr to provide real-time recommendations. The presentation will also include enough theory to provide useful working intuitions for those desiring to adapt this design.

The entire system including a data generator, off-line analysis scripts, Solr configurations and sample web pages will be made available on github for attendees to modify as they like.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
655
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
18
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. SYSTEM TEARDOWN: SOLR AS A PRACTICAL RECOMMENDATION ENGINE Michael Hausenblas Twitter: @mhausenblas Chief Data Engineer EMEA, MapR Technologies
  • 2. What does Machine Learning look like?
  • 3. What does Machine Learning look like? ! T # ! A A # ! A A # = % A1 &! A A # 2 $ " 1 2 $ 2 $ " 1 % AT &" 1 " 2 $ ! T # T A1 A1 A1 A 2 & =% T % A 2 A1 AT A 2 & 2 " $ O(κ  k  d  +  k3  d)  =  O(k2  d  log  !  +  k3  d)  for  T A k,  T A #! n r # ! A small  A 1 2 & % 1 &=% 1 1 % high  quality   T T &" % r2 & % A 2 k 1 " $ " O(κ  d  log  k)  or  O(d  log  κ  log  k)  for  larger  A,   A 2 A 2 $% looser  quality   ! ! T #% T r1 = % A1 A1 A1 A 2 & " $% " T h1 # & h2 & $ h1 # & h2 & $
  • 4. Recommendations as Machine Learning •  Observation of interactions between users taking actions and items for input data to recommender model •  Goal: suggest additional appropriate or desirable interactions •  Example applications: –  similar movie, music, books (topic, style, etc.) –  map-based restaurant choices –  suggesting sale items for e-stores or cash-register receipts
  • 5. Recommendations Recap:   Behavior  of  a  crowd  helps  us   understand  what  individuals  will  do  
  • 6. Recommendations Alice   Charles   Alice  got  an  apple  and  a  puppy   Charles  got  a  bicycle  
  • 7. Recommendations Alice   Bob   Charles   Alice  got  an  apple  and  a  puppy   Bob  got  an  apple   Charles  got  a  bicycle  
  • 8. Recommendations Alice   Bob   Charles   ?   What  else  would  Bob  like?  
  • 9. Recommendations Alice   Bob   Charles   A  puppy,  of  course!  
  • 10. You  get  the  idea  of  how   recommenders  work  …      
  • 11. Recommendations Alice   What  if  everybody  gets  a  pony?     Bob   Amelia   Charles   ?     What  else  would  you   recommend  for  Amelia?  
  • 12. Recommendations Alice   Bob   Amelia   Charles   ?   If  everybody  gets  a  pony,  it’s   not  a  very  good  indicator  of   what  to  else  predict  ...  
  • 13. Problems with Raw Co-occurrence •  •  •  Very popular items co-occur with everything –  Examples: welcome document; elevator music Very widespread occurrence is not interesting as a way to generate indicators –  Unless you want to offer an item that is constantly desired, such as razor blades What we want is anomalous co-occurrence –  This is the source of interesting indicators of preference on which to base recommendation
  • 14. Get Useful Indicators from Behaviors 1.  Use log files to build history matrix of users x items –  Remember: this history of interactions will be sparse compared to all potential combinations 2.  Transform to a co-occurrence matrix of items x items 3.  Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix –  Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference –  RowSimilarityJob in Apache Mahout uses LLR
  • 15. Log Files Alice   Charles   Charles   Alice   Alice   Bob   Bob  
  • 16. Log Files u1   t1   u2   t4   u2   t3   u1   t2   u1   t3   u3   t3   u3   t1  
  • 17. Log Files and Dimensions u1   t1   u2   t4   u2   t3   u1   Things   t1   t2   u1   t3   u3   t3   u3   t1   Users   u1   Alice   u2   Charles   u3   Bob   t2   t3   t4  
  • 18. History Matrix: Users by Items Alice   Bob   Charles   ✔   ✔   ✔   ✔   ✔   ✔   ✔  
  • 19. Co-occurrence Matrix: Items by Items How  do  you  tell  which   co-­‐occurrences  are   useful?   1   1   2   1   0   0   -­‐   2   1   0   0   1   1   Use  LLR  test  to  turn  co-­‐occurrence  into  indicators…  
  • 20. Co-occurrence Binary Matrix not   not   1   1   1  
  • 21. Spot the Anomaly A   not  A   B   13   1000   not  B   1000   100,000   A   not  A   B   1   0   not  B   0   10,000   A   not  A   B   1   0   not  B   0   2   A   not  A   B   10   0   not  B   0   100,000   What  conclusion  do  you  draw  from  each  situa9on?  
  • 22. Spot the Anomaly A   not  A   B   13   1000   not  B   1000   100,000   A   not  A   B   1   0   not  B   0   10,000   0.90   4.52   •  •  A   not  A   B   1   0   not  B   0   2   A   not  A   B   10   0   not  B   0   100,000   1.95   14.3   Root LLR is roughly like standard deviations In Apache Mahout, RowSimilarityJob uses  LLR
  • 23. Indicator Matrix: Anomalous Co-cccurrence Result:  The  marked  row   will  be  added  to  the   indicator  field  in  the   item  document  …     ✔   ✔   Significant  co-­‐occurrences!  indicators    
  • 24. Indicator Matrix ✔   id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) That  one  row  from  indicator   matrix  becomes  the   indicator  field  in  the  Solr   document  used  to  deploy  the   recommenda@on  engine   Note:  data  for  the  indicator  field   is  added  directly  to  meta  data  for   a  document  in  Solr  index.     You  don’t  need  to  create  a   separate  index  for  the  indicators.  
  • 25. Demo time!
  • 26. Internals of the Recommender Engine 27  
  • 27. Looking Inside LucidWorks What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 28  
  • 28. History collector (6) User behavior generator (1) Presentation tier (2) Diagnostic browsing (9) Cooccurrence analysis (7) Post to search engine (8) Search engine (4) Session collector (3) http://bita.ly/18vbbaT     Metrics and logs (5)
  • 29. Example: search based recommendation
  • 30. Search-based recommendation •  Sample Document –  Merchant Id original  data   –  Field for text description and  meta-­‐data   –  Phone –  Address –  Location –  –  –  –  –  •  Sample Query –  Current location –  Recent merchant descriptions –  Recent merchant id’s –  Recent SIC codes –  Recent accepted offers –  Local Top40 Indicator merchant id’s recommendaRon  query   Indicator industry (SIC) id’s Indicator offers Indicator text derived  from  co-­‐occurrence  analysis   Local Top40
  • 31. Analyze with MapReduce complete   history   Co-­‐occurrence   (Mahout)   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   indexing   Index   shards  
  • 32. Deploy with Conventional Search System user   history   Web  Rer   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   search   Index   shards  
  • 33. Outro •  Kudos to Ted Dunning, Grant Ingersoll and LucidWorks, for the idea & the demo! •  Get in touch: Twitter—@mhausenblas, @MapR •  Ah, and, btw: we’re hiring ;)