Your SlideShare is downloading. ×
0
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

791

Published on

This session will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and …

This session will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system is built using Mahout to do off-line analysis and Solr to provide real-time recommendations. The presentation will also include enough theory to provide useful working intuitions for those desiring to adapt this design.

The entire system including a data generator, off-line analysis scripts, Solr configurations and sample web pages will be made available on github for attendees to modify as they like.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
791
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SYSTEM TEARDOWN: SOLR AS A PRACTICAL RECOMMENDATION ENGINE Michael Hausenblas Twitter: @mhausenblas Chief Data Engineer EMEA, MapR Technologies
  • 2. What does Machine Learning look like?
  • 3. What does Machine Learning look like? ! T # ! A A # ! A A # = % A1 &! A A # 2 $ " 1 2 $ 2 $ " 1 % AT &" 1 " 2 $ ! T # T A1 A1 A1 A 2 & =% T % A 2 A1 AT A 2 & 2 " $ O(κ  k  d  +  k3  d)  =  O(k2  d  log  !  +  k3  d)  for  T A k,  T A #! n r # ! A small  A 1 2 & % 1 &=% 1 1 % high  quality   T T &" % r2 & % A 2 k 1 " $ " O(κ  d  log  k)  or  O(d  log  κ  log  k)  for  larger  A,   A 2 A 2 $% looser  quality   ! ! T #% T r1 = % A1 A1 A1 A 2 & " $% " T h1 # & h2 & $ h1 # & h2 & $
  • 4. Recommendations as Machine Learning •  Observation of interactions between users taking actions and items for input data to recommender model •  Goal: suggest additional appropriate or desirable interactions •  Example applications: –  similar movie, music, books (topic, style, etc.) –  map-based restaurant choices –  suggesting sale items for e-stores or cash-register receipts
  • 5. Recommendations Recap:   Behavior  of  a  crowd  helps  us   understand  what  individuals  will  do  
  • 6. Recommendations Alice   Charles   Alice  got  an  apple  and  a  puppy   Charles  got  a  bicycle  
  • 7. Recommendations Alice   Bob   Charles   Alice  got  an  apple  and  a  puppy   Bob  got  an  apple   Charles  got  a  bicycle  
  • 8. Recommendations Alice   Bob   Charles   ?   What  else  would  Bob  like?  
  • 9. Recommendations Alice   Bob   Charles   A  puppy,  of  course!  
  • 10. You  get  the  idea  of  how   recommenders  work  …      
  • 11. Recommendations Alice   What  if  everybody  gets  a  pony?     Bob   Amelia   Charles   ?     What  else  would  you   recommend  for  Amelia?  
  • 12. Recommendations Alice   Bob   Amelia   Charles   ?   If  everybody  gets  a  pony,  it’s   not  a  very  good  indicator  of   what  to  else  predict  ...  
  • 13. Problems with Raw Co-occurrence •  •  •  Very popular items co-occur with everything –  Examples: welcome document; elevator music Very widespread occurrence is not interesting as a way to generate indicators –  Unless you want to offer an item that is constantly desired, such as razor blades What we want is anomalous co-occurrence –  This is the source of interesting indicators of preference on which to base recommendation
  • 14. Get Useful Indicators from Behaviors 1.  Use log files to build history matrix of users x items –  Remember: this history of interactions will be sparse compared to all potential combinations 2.  Transform to a co-occurrence matrix of items x items 3.  Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix –  Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference –  RowSimilarityJob in Apache Mahout uses LLR
  • 15. Log Files Alice   Charles   Charles   Alice   Alice   Bob   Bob  
  • 16. Log Files u1   t1   u2   t4   u2   t3   u1   t2   u1   t3   u3   t3   u3   t1  
  • 17. Log Files and Dimensions u1   t1   u2   t4   u2   t3   u1   Things   t1   t2   u1   t3   u3   t3   u3   t1   Users   u1   Alice   u2   Charles   u3   Bob   t2   t3   t4  
  • 18. History Matrix: Users by Items Alice   Bob   Charles   ✔   ✔   ✔   ✔   ✔   ✔   ✔  
  • 19. Co-occurrence Matrix: Items by Items How  do  you  tell  which   co-­‐occurrences  are   useful?   1   1   2   1   0   0   -­‐   2   1   0   0   1   1   Use  LLR  test  to  turn  co-­‐occurrence  into  indicators…  
  • 20. Co-occurrence Binary Matrix not   not   1   1   1  
  • 21. Spot the Anomaly A   not  A   B   13   1000   not  B   1000   100,000   A   not  A   B   1   0   not  B   0   10,000   A   not  A   B   1   0   not  B   0   2   A   not  A   B   10   0   not  B   0   100,000   What  conclusion  do  you  draw  from  each  situa9on?  
  • 22. Spot the Anomaly A   not  A   B   13   1000   not  B   1000   100,000   A   not  A   B   1   0   not  B   0   10,000   0.90   4.52   •  •  A   not  A   B   1   0   not  B   0   2   A   not  A   B   10   0   not  B   0   100,000   1.95   14.3   Root LLR is roughly like standard deviations In Apache Mahout, RowSimilarityJob uses  LLR
  • 23. Indicator Matrix: Anomalous Co-cccurrence Result:  The  marked  row   will  be  added  to  the   indicator  field  in  the   item  document  …     ✔   ✔   Significant  co-­‐occurrences!  indicators    
  • 24. Indicator Matrix ✔   id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) That  one  row  from  indicator   matrix  becomes  the   indicator  field  in  the  Solr   document  used  to  deploy  the   recommenda@on  engine   Note:  data  for  the  indicator  field   is  added  directly  to  meta  data  for   a  document  in  Solr  index.     You  don’t  need  to  create  a   separate  index  for  the  indicators.  
  • 25. Demo time!
  • 26. Internals of the Recommender Engine 27  
  • 27. Looking Inside LucidWorks What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 28  
  • 28. History collector (6) User behavior generator (1) Presentation tier (2) Diagnostic browsing (9) Cooccurrence analysis (7) Post to search engine (8) Search engine (4) Session collector (3) http://bita.ly/18vbbaT     Metrics and logs (5)
  • 29. Example: search based recommendation
  • 30. Search-based recommendation •  Sample Document –  Merchant Id original  data   –  Field for text description and  meta-­‐data   –  Phone –  Address –  Location –  –  –  –  –  •  Sample Query –  Current location –  Recent merchant descriptions –  Recent merchant id’s –  Recent SIC codes –  Recent accepted offers –  Local Top40 Indicator merchant id’s recommendaRon  query   Indicator industry (SIC) id’s Indicator offers Indicator text derived  from  co-­‐occurrence  analysis   Local Top40
  • 31. Analyze with MapReduce complete   history   Co-­‐occurrence   (Mahout)   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   indexing   Index   shards  
  • 32. Deploy with Conventional Search System user   history   Web  Rer   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   search   Index   shards  
  • 33. Outro •  Kudos to Ted Dunning, Grant Ingersoll and LucidWorks, for the idea & the demo! •  Get in touch: Twitter—@mhausenblas, @MapR •  Ah, and, btw: we’re hiring ;)

×