Building multi-modal recommendation engines using search engines

Introduction to Mahout

And How To Build a Recommender

©MapR Technologies 2013- Confidential

1

Topic For Today


What is recommendation?



What makes it different?



What is multi-model recommendation?



How can I build it using common household items?


2

Oh … Also This


Detailed break-down of a live machine learning system running
with Mahout on MapR



With code examples


3

I may have to
summarize


4

I may have to
summarize
just a bit


5

Part 1:
5 minutes of background


6

Part 2:
5 minutes: I want a pony


7


8

Part 1:
5 minutes of background


9

What Does Machine Learning Look Like?


10

What Does Machine Learning Look Like?
é T ù
T
é A A ù é A A ù = ê A1 úé
2 û ë
1
2 û
ë 1
ê AT úë
ë 2 û
é T
A1 A1
=ê T
ê A 2 A1
ë
é r ù é AT A
ê 1 ú=ê 1 1
ê r2 ú ê AT A1
ë
û ë 2

k3

O(k2

k3

O(κ k d + d) =
d log n + d) for small
k, high quality
O(κ d log k) or O(d log κ log k) for larger
k, looser quality

A1

A2 ù
û

ù
T
A1 A 2 ú
AT A 2 ú
2
û
ù
T
A1 A 2 úé h1 ù
ê
ú
T
úê h 2 ú
A 2 A 2 ûë
û

é T
T
r1 = ê A1 A1 A1 A 2
ë

é
ù
ùê h1 ú
úê h ú
û
ë 2 û

But tonight we’re going to show you how to keep it simple yet powerful…


11

Recommendations as Machine Learning


Recommendation:
–

–
–

Involves observation of interactions between people taking action (users)
and items for input data to the recommender model
Goal is to suggest additional appropriate or desirable interactions
Applications include: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts


12


13


14

Part 2:
How recommenders work
(I still want a pony)


15

Recommendations

Recap:
Behavior of a crowd helps us
understand what individuals will do


16

Recommendations

Alice

Charles


Alice got an apple and a
puppy

Charles got a bicycle

17

Recommendations

Alice

Bob

Charles


Alice got an apple and a
puppy

Bob got an apple

Charles got a bicycle

18

Recommendations

Alice

Bob

?

What else would Bob like?

Charles


19

Recommendations

Alice

Bob

A puppy, of course!

Charles


20

You get the idea of how
recommenders work…
(By the way, like me, Bob
also wants a pony)


21

Recommendations
Alice

What if everybody gets a
pony?

Bob

Amelia

?

What else would you
recommend for Amelia?

Charles


22

Recommendations
Alice

Bob

Amelia

?

If everybody gets a pony, it’s
not a very good indicator of
what to else predict...

Charles


23

Problems with Raw Co-occurrence


Very popular items co-occur with everything (or why it’s not
very helpful to know that everybody wants a pony…)
–



Very widespread occurrence is not interesting as a way to
generate indicators
–



Examples: Welcome document; Elevator music

Unless you want to offer an item that is constantly desired, such as razor
blades (or ponies)

What we want is anomalous co-occurrence
–

This is the source of interesting indicators of preference on which to
base recommendation


24

Get Useful Indicators from Behaviors
Use log files to build history matrix of users x items

1.
–

Remember: this history of interactions will be sparse compared to all
potential combinations

2.

Transform to a co-occurrence matrix of items x items

3.

Look for useful co-occurrence by looking for anomalous cooccurrences to make an indicator matrix
–

–

Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences
can with confidence be used as indicators of preference
RowSimilarityJob in Apache Mahout uses LLR


25

Log Files
Alice
Charles
Charles
Alice

Alice
Bob
Bob

26

Log Files
u1
u2

t4

u2

t3

u1

t2

u1

t3

u3

t3

u3

t1

t1
27

Log Files and Dimensions
u1

t1

u2

t4

u2

t3

u1

t2

t1

u1

t3

t2

u3

t3

u3

t1


Things

Users
u1 Alice
u2 Charles
u3 Bob

28

t3

t4

History Matrix: Users by Items

Alice

✔

Bob

✔

Charles


✔

✔
✔
✔

29

✔

Co-occurrence Matrix: Items by Items
How do you tell which co-occurrences are useful?.

1

2

1

1

2


1

0

-

0

1

1
30

0
0

Co-occurrence Matrix: Items by Items
Use LLR test to turn co-occurrence into indicators…

1

2

1

1

2


1

0

-

0

1

1
31

0
0

Co-occurrence Binary Matrix

not
not


1
1

32

1

Spot the Anomaly
What conclusion do you draw from each situation?
A

not A

B

13

1000

not B

1000

100,000

A

not A

B

1

0

not B

0

10,000


A
B

1

0

not B

0

2

A

not A

B

10

0

not B

33

not A

0

100,000

Spot the Anomaly
What conclusion do you draw from each situation?
A

not A

B

13

1000

not B

1000

100,000

A

not A

B

1

0

not B

0

10,000

0.90
4.52

A

not A

B

1

0

not B

0

2

A

not A

B

10

0

not B

0

100,000

1.95
14.3

Root LLR is roughly like standard deviations
 In Apache Mahout, RowSimilarityJob uses LLR



34

Co-occurrence Matrix
Recap: Use LLR test to turn co-occurrence into indicators

1

2

1

1

2


1

0

-

0

1

1
35

0
0

Indicator Matrix: Anomalous Co-Occurrence
Result: The marked row will be added to the indicator
field in the item document…

✔

✔


36

Indicator Matrix
That one row from indicator matrix becomes the indicator field in the Solr
document used to deploy the recommendation engine.

✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators:

(t1)

Note: data for the indicator field is added directly to meta-data for a document in
Solr index. You don’t need to create a separate index for the indicators.

37

Internals of the Recommender Engine

38


38

Internals of the Recommender Engine

39


39

Looking Inside LucidWorks
Real-time recommendation query and results: Evaluation

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”
40


40

Search-based Recommendations


Sample document
–
–
–
–
–

Merchant Id
Field for text description
Phone
Address
Location


41



Sample document
–
–
–
–
–
–
–
–
–
–

Merchant Id
Phone
Address
Location
Indicator merchant id’s
Indicator industry (SIC) id’s
Indicator offers
Indicator text
Local top40


42



Sample document
–
–
–
–
–



Merchant Id
Phone
Address
Location

Sample query
–
–
–
–
–

–
–
–
–
–
–

Indicator offers
Indicator text
Local top40


43

Current location
Recent merchant descriptions
Recent merchant id’s
Recent SIC codes
Recent accepted offers
Local top40



Original data
Sample document
and meta-data
– Merchant Id
–
–
–
–



Sample query
–

Phone
Address
Location

–
–
–
–

–
–
–
–
–
–

Current location
Recent merchant descriptions
Recent merchant id’s
Recent SIC codes
Recent accepted offers
Local top40

Recommendation
query
Indicator offers
Indicator text
Derived from cooccurrence
Local top40

and cross-occurrence
analysis

44

For example


Users enter queries (A)
–



Users view videos (B)
–



(actor = user, item=video)

ATA gives query recommendation
–



(actor = user, item=query)

“did you mean to ask for”

BTB gives video recommendation
–

“you might like these videos”


45

The punch-line


BTA recommends videos in response to a query
–
–

(isn’t that a search engine?)
(not quite, it doesn’t look at content or meta-data)


46

Real-life example


Query: “Paco de Lucia”



Conventional meta-data search results:
–
–



“hombres del paco” times 400
not much else

Recommendation based search:
–
–

–

Flamenco guitar and dancers
Spanish and classical guitar
Van Halen doing a classical/flamenco riff


47

Real-life example


48

Hypothetical Example


Want a navigational ontology?



Just put labels on a web page with traffic
–



Remember viewing history
–



This gives B = users x items

Cross recommend
–



This gives A = users x label clicks

B’A = label to item mapping

After several users click, results are whatever users think they
should be


49

Nice. But we
can do better?


50

A Quick Simplification


Users who do h (a vector of things a user has done)

Ah


A translates things into users

Also do r

A ( Ah)

User-centric recommendations
(transpose translates back to things)

( A A) h

Item-centric recommendations
(change the order of operations)

T

T


51

Symmetry Gives Cross Recommentations

( A A) h
T

(

)

BT A h


Conventional recommendations
with off-line learning

Cross recommendations

52

things

users


A

53

thing thing
type 1 type 2
users


é A A ù
2 û
ë 1

54

é A
ë 1

é
A 2 ù é A1 A 2 ù = ê
û ë
û ê
ë
é
=ê
ê
ë
é r ù é
ê 1 ú=ê
ê r2 ú ê
ë
û ë
T

T ù
A1 úé
A1
T úë
A2 û

A2 ù
û

ù
T
T
A1 A1 A1 A 2 ú
AT A1 AT A 2 ú
2
2
û

ù
T
T
A1 A1 A1 A 2 úé h1
ê
T
T
A 2 A1 A 2 A 2 úê h 2
ûë
é h
é T
ù 1
T
r1 = ê A1 A1 A1 A 2 úê
ë
ûê h 2
ë


55

ù
ú
ú
û
ù
ú
ú
û

Part 3:
What about that worked
example?


56

History collector
(6)

User behavior
generator (1)

Presentation
tier (2)

Diagnostic
browsing (9)

Cooccurrence
analysis (7)

Post to
search
engine (8)

Search
engine (4)

Session
collector
(3)

http://bit.ly/18vbbaT

57

Metrics and
logs (5)

Analyze with Map-Reduce

Complete
history

SolR
SolR
Indexer
Solr
Indexer
indexing

Cooccurrence
(Mahout)

Item metadata


Index
shards

58

Deploy with Conventional Search System

User
history

SolR
SolR
Indexer
Solr
Indexer
search

Web tier

Item metadata


Index
shards

59

Me, Us


Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG



MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s



Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR


60

Building multi-modal recommendation engines using search engines

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Building multi-modal recommendation engines using search engines

Similar to Building multi-modal recommendation engines using search engines (20)

More from Ted Dunning

More from Ted Dunning (11)

Recently uploaded

Recently uploaded (20)

Building multi-modal recommendation engines using search engines

Editor's Notes