SYSTEM TEARDOWN: SOLR AS A
PRACTICAL RECOMMENDATION ENGINE

Michael Hausenblas
Twitter: @mhausenblas

Chief Data Engineer ...
What does Machine Learning look like?
What does Machine Learning look like?
! T #
! A A # ! A A # = % A1 &! A A #
2 $ "
1
2 $
2 $
" 1
% AT &" 1
" 2 $
! T
#
T
A1...
Recommendations as Machine Learning
• 

Observation of interactions between users taking
actions and items for input data ...
Recommendations

Recap:	
  
Behavior	
  of	
  a	
  crowd	
  helps	
  us	
  
understand	
  what	
  individuals	
  will	
  d...
Recommendations

Alice	
  

Charles	
  

Alice	
  got	
  an	
  apple	
  and	
  a	
  puppy	
  

Charles	
  got	
  a	
  bicy...
Recommendations

Alice	
  
Bob	
  
Charles	
  

Alice	
  got	
  an	
  apple	
  and	
  a	
  puppy	
  
Bob	
  got	
  an	
  a...
Recommendations

Alice	
  
Bob	
  
Charles	
  

?	
  

What	
  else	
  would	
  Bob	
  like?	
  
Recommendations

Alice	
  
Bob	
  
Charles	
  

A	
  puppy,	
  of	
  course!	
  
You	
  get	
  the	
  idea	
  of	
  how	
  
recommenders	
  work	
  …	
  
	
  	
  
Recommendations

Alice	
  

What	
  if	
  everybody	
  gets	
  a	
  pony?	
  
	
  

Bob	
  
Amelia	
  
Charles	
  

?	
  
...
Recommendations

Alice	
  
Bob	
  
Amelia	
  
Charles	
  

?	
  

If	
  everybody	
  gets	
  a	
  pony,	
  it’s	
  
not	
 ...
Problems with Raw Co-occurrence
• 
• 

• 

Very popular items co-occur with everything
–  Examples: welcome document; elev...
Get Useful Indicators from Behaviors
1. 

Use log files to build history matrix of users x items
–  Remember: this history...
Log Files

Alice	
  
Charles	
  
Charles	
  
Alice	
  
Alice	
  
Bob	
  
Bob	
  
Log Files

u1	
  

t1	
  

u2	
  

t4	
  

u2	
  

t3	
  

u1	
  

t2	
  

u1	
  

t3	
  

u3	
  

t3	
  

u3	
  

t1	
  
Log Files and Dimensions
u1	
  

t1	
  

u2	
  

t4	
  

u2	
  

t3	
  

u1	
  

Things	
  
t1	
  

t2	
  

u1	
  

t3	
  ...
History Matrix: Users by Items

Alice	
  
Bob	
  
Charles	
  

✔	
   ✔	
  

✔	
  

✔	
  

✔	
  
✔	
  

✔	
  
Co-occurrence Matrix: Items by Items

How	
  do	
  you	
  tell	
  which	
  
co-­‐occurrences	
  are	
  
useful?	
  

1	
  ...
Co-occurrence Binary Matrix

not	
  
not	
  

1	
  
1	
   1	
  
Spot the Anomaly
A	
  

not	
  A	
  

B	
  

13	
  

1000	
  

not	
  B	
  

1000	
  

100,000	
  

A	
  

not	
  A	
  

B...
Spot the Anomaly
A	
  

not	
  A	
  

B	
  

13	
  

1000	
  

not	
  B	
  

1000	
  

100,000	
  

A	
  

not	
  A	
  

B...
Indicator Matrix: Anomalous Co-cccurrence
Result:	
  The	
  marked	
  row	
  
will	
  be	
  added	
  to	
  the	
  
indicat...
Indicator Matrix

✔	
  
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet

indicators:
(...
Demo time!
Internals of the Recommender Engine

27	
  
Looking Inside LucidWorks

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “...
History collector
(6)

User behavior
generator (1)

Presentation
tier (2)

Diagnostic
browsing (9)

Cooccurrence
analysis ...
Example: search based
recommendation
Search-based recommendation
• 

Sample Document
–  Merchant Id original	
  data	
  
–  Field for text description
and	
  m...
Analyze with MapReduce

complete	
  
history	
  

Co-­‐occurrence	
  
(Mahout)	
  

Item	
  meta-­‐data	
  

SolR	
  
SolR...
Deploy with Conventional Search System

user	
  
history	
  

Web	
  Rer	
  

Item	
  meta-­‐data	
  

SolR	
  
SolR	
  
S...
Outro

• 

Kudos to Ted Dunning, Grant Ingersoll and LucidWorks,
for the idea & the demo!

• 

Get in touch: Twitter—@mhau...
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
Upcoming SlideShare
Loading in...5
×

2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

814

Published on

This session will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system is built using Mahout to do off-line analysis and Solr to provide real-time recommendations. The presentation will also include enough theory to provide useful working intuitions for those desiring to adapt this design.

The entire system including a data generator, off-line analysis scripts, Solr configurations and sample web pages will be made available on github for attendees to modify as they like.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
814
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine"

  1. 1. SYSTEM TEARDOWN: SOLR AS A PRACTICAL RECOMMENDATION ENGINE Michael Hausenblas Twitter: @mhausenblas Chief Data Engineer EMEA, MapR Technologies
  2. 2. What does Machine Learning look like?
  3. 3. What does Machine Learning look like? ! T # ! A A # ! A A # = % A1 &! A A # 2 $ " 1 2 $ 2 $ " 1 % AT &" 1 " 2 $ ! T # T A1 A1 A1 A 2 & =% T % A 2 A1 AT A 2 & 2 " $ O(κ  k  d  +  k3  d)  =  O(k2  d  log  !  +  k3  d)  for  T A k,  T A #! n r # ! A small  A 1 2 & % 1 &=% 1 1 % high  quality   T T &" % r2 & % A 2 k 1 " $ " O(κ  d  log  k)  or  O(d  log  κ  log  k)  for  larger  A,   A 2 A 2 $% looser  quality   ! ! T #% T r1 = % A1 A1 A1 A 2 & " $% " T h1 # & h2 & $ h1 # & h2 & $
  4. 4. Recommendations as Machine Learning •  Observation of interactions between users taking actions and items for input data to recommender model •  Goal: suggest additional appropriate or desirable interactions •  Example applications: –  similar movie, music, books (topic, style, etc.) –  map-based restaurant choices –  suggesting sale items for e-stores or cash-register receipts
  5. 5. Recommendations Recap:   Behavior  of  a  crowd  helps  us   understand  what  individuals  will  do  
  6. 6. Recommendations Alice   Charles   Alice  got  an  apple  and  a  puppy   Charles  got  a  bicycle  
  7. 7. Recommendations Alice   Bob   Charles   Alice  got  an  apple  and  a  puppy   Bob  got  an  apple   Charles  got  a  bicycle  
  8. 8. Recommendations Alice   Bob   Charles   ?   What  else  would  Bob  like?  
  9. 9. Recommendations Alice   Bob   Charles   A  puppy,  of  course!  
  10. 10. You  get  the  idea  of  how   recommenders  work  …      
  11. 11. Recommendations Alice   What  if  everybody  gets  a  pony?     Bob   Amelia   Charles   ?     What  else  would  you   recommend  for  Amelia?  
  12. 12. Recommendations Alice   Bob   Amelia   Charles   ?   If  everybody  gets  a  pony,  it’s   not  a  very  good  indicator  of   what  to  else  predict  ...  
  13. 13. Problems with Raw Co-occurrence •  •  •  Very popular items co-occur with everything –  Examples: welcome document; elevator music Very widespread occurrence is not interesting as a way to generate indicators –  Unless you want to offer an item that is constantly desired, such as razor blades What we want is anomalous co-occurrence –  This is the source of interesting indicators of preference on which to base recommendation
  14. 14. Get Useful Indicators from Behaviors 1.  Use log files to build history matrix of users x items –  Remember: this history of interactions will be sparse compared to all potential combinations 2.  Transform to a co-occurrence matrix of items x items 3.  Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix –  Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference –  RowSimilarityJob in Apache Mahout uses LLR
  15. 15. Log Files Alice   Charles   Charles   Alice   Alice   Bob   Bob  
  16. 16. Log Files u1   t1   u2   t4   u2   t3   u1   t2   u1   t3   u3   t3   u3   t1  
  17. 17. Log Files and Dimensions u1   t1   u2   t4   u2   t3   u1   Things   t1   t2   u1   t3   u3   t3   u3   t1   Users   u1   Alice   u2   Charles   u3   Bob   t2   t3   t4  
  18. 18. History Matrix: Users by Items Alice   Bob   Charles   ✔   ✔   ✔   ✔   ✔   ✔   ✔  
  19. 19. Co-occurrence Matrix: Items by Items How  do  you  tell  which   co-­‐occurrences  are   useful?   1   1   2   1   0   0   -­‐   2   1   0   0   1   1   Use  LLR  test  to  turn  co-­‐occurrence  into  indicators…  
  20. 20. Co-occurrence Binary Matrix not   not   1   1   1  
  21. 21. Spot the Anomaly A   not  A   B   13   1000   not  B   1000   100,000   A   not  A   B   1   0   not  B   0   10,000   A   not  A   B   1   0   not  B   0   2   A   not  A   B   10   0   not  B   0   100,000   What  conclusion  do  you  draw  from  each  situa9on?  
  22. 22. Spot the Anomaly A   not  A   B   13   1000   not  B   1000   100,000   A   not  A   B   1   0   not  B   0   10,000   0.90   4.52   •  •  A   not  A   B   1   0   not  B   0   2   A   not  A   B   10   0   not  B   0   100,000   1.95   14.3   Root LLR is roughly like standard deviations In Apache Mahout, RowSimilarityJob uses  LLR
  23. 23. Indicator Matrix: Anomalous Co-cccurrence Result:  The  marked  row   will  be  added  to  the   indicator  field  in  the   item  document  …     ✔   ✔   Significant  co-­‐occurrences!  indicators    
  24. 24. Indicator Matrix ✔   id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) That  one  row  from  indicator   matrix  becomes  the   indicator  field  in  the  Solr   document  used  to  deploy  the   recommenda@on  engine   Note:  data  for  the  indicator  field   is  added  directly  to  meta  data  for   a  document  in  Solr  index.     You  don’t  need  to  create  a   separate  index  for  the  indicators.  
  25. 25. Demo time!
  26. 26. Internals of the Recommender Engine 27  
  27. 27. Looking Inside LucidWorks What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 28  
  28. 28. History collector (6) User behavior generator (1) Presentation tier (2) Diagnostic browsing (9) Cooccurrence analysis (7) Post to search engine (8) Search engine (4) Session collector (3) http://bita.ly/18vbbaT     Metrics and logs (5)
  29. 29. Example: search based recommendation
  30. 30. Search-based recommendation •  Sample Document –  Merchant Id original  data   –  Field for text description and  meta-­‐data   –  Phone –  Address –  Location –  –  –  –  –  •  Sample Query –  Current location –  Recent merchant descriptions –  Recent merchant id’s –  Recent SIC codes –  Recent accepted offers –  Local Top40 Indicator merchant id’s recommendaRon  query   Indicator industry (SIC) id’s Indicator offers Indicator text derived  from  co-­‐occurrence  analysis   Local Top40
  31. 31. Analyze with MapReduce complete   history   Co-­‐occurrence   (Mahout)   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   indexing   Index   shards  
  32. 32. Deploy with Conventional Search System user   history   Web  Rer   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   search   Index   shards  
  33. 33. Outro •  Kudos to Ted Dunning, Grant Ingersoll and LucidWorks, for the idea & the demo! •  Get in touch: Twitter—@mhausenblas, @MapR •  Ah, and, btw: we’re hiring ;)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×