1©MapR Technologies 2013-
Multi-input Recommendations
2©MapR Technologies 2013-
Me, Us
 Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
 MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
 Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
3©MapR Technologies 2013-
Topic For Today
 What is recommendation?
 What makes it different?
 What is multi-model recommendation?
 How can I build it using common household items?
4©MapR Technologies 2013-
Oh … Also This
 Detailed break-down of a live machine learning system running
with Mahout on MapR
 With code examples
5©MapR Technologies 2013-
I may have to
summarize
6©MapR Technologies 2013-
I may have to
summarize
just a bit
7©MapR Technologies 2013-
Part 1:
5 minutes of background
8©MapR Technologies 2013-
Part 2:
10 minutes: I want a pony
9©MapR Technologies 2013-
Part 3:
Implementation Examples
10©MapR Technologies 2013-
11©MapR Technologies 2013-
Part 1:
5 minutes of background
12©MapR Technologies 2013-
What Does Machine Learning Look Like?
13©MapR Technologies 2013-
Recommendations as Machine Learning
 Recommendation:
– Involves observation of interactions between people taking action (users)
and items for input data to the recommender model
– Goal is to suggest additional appropriate or desirable interactions
– Applications include: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts
14©MapR Technologies 2013-
15©MapR Technologies 2013-
16©MapR Technologies 2013-
Part 2:
How recommenders work
(I still want a pony)
17©MapR Technologies 2013-
Recommendations
Recap:
Behavior of a crowd helps us
understand what individuals will do
18©MapR Technologies 2013-
Recommendations
Alice got an apple and a
puppy
Charles got a bicycle
Alice
Charles
19©MapR Technologies 2013-
Recommendations
Alice got an apple and a
puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles
20©MapR Technologies 2013-
Recommendations
What else would Bob like??
Alice
Bob
Charles
21©MapR Technologies 2013-
Recommendations
A puppy, of course!
Alice
Bob
Charles
22©MapR Technologies 2013-
You get the idea of how
recommenders work…
(By the way, like me, Bob
also wants a pony)
23©MapR Technologies 2013-
Recommendations
What if everybody gets a
pony?
?
Alice
Bob
Charles
Amelia What else would you
recommend for Amelia?
24©MapR Technologies 2013-
Recommendations
?
Alice
Bob
Charles
Amelia
If everybody gets a pony, it’s
not a very good indicator of
what to else predict...
25©MapR Technologies 2013-
Problems with Raw Co-occurrence
 Very popular items co-occur with everything (or why it’s not
very helpful to know that everybody wants a pony…)
– Examples: Welcome document; Elevator music
 Very widespread occurrence is not interesting as a way to
generate indicators
– Unless you want to offer an item that is constantly desired, such as razor
blades (or ponies)
 What we want is anomalous co-occurrence
– This is the source of interesting indicators of preference on which to
base recommendation
26©MapR Technologies 2013-
Get Useful Indicators from Behaviors
1. Use log files to build history matrix of users x items
– Remember: this history of interactions will be sparse compared to all
potential combinations
2. Transform to a co-occurrence matrix of items x items
3. Look for useful co-occurrence by looking for anomalous co-
occurrences to make an indicator matrix
– Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences
can with confidence be used as indicators of preference
– RowSimilarityJob in Apache Mahout uses LLR
27©MapR Technologies 2013-
Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice
28©MapR Technologies 2013-
History Matrix: Users by Items
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔
29©MapR Technologies 2013-
Co-occurrence Matrix: Items by Items
-
1 2
1 1
1
1
2 1
How do you tell which co-occurrences are useful?.
0
0
0 0
30©MapR Technologies 2013-
Co-occurrence Binary Matrix
1
1not
not
1
31©MapR Technologies 2013-
Indicator Matrix: Anomalous Co-Occurrence
✔
Result: The marked row will be added to the indicator
field in the item document…
32©MapR Technologies 2013-
Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators: (t1)
That one row from indicator matrix becomes the indicator field in the Solr
document used to deploy the recommendation engine.
Note: data for the indicator field is added directly to meta-data for a document in
Solr index. You don’t need to create a separate index for the indicators.
33©MapR Technologies 2013-
Internals of the Recommender Engine
33
34©MapR Technologies 2013-
Internals of the Recommender Engine
34
35©MapR Technologies 2013-
A Quick Simplification
 Users who do h (a vector of things a user has done)
 Also do r
Ah
AT
Ah( )
AT
A( )h
User-centric recommendations
(transpose translates back to things)
Item-centric recommendations
(change the order of operations)
A translates things into users
36©MapR Technologies 2013-
Nice. But we
can do better?
37©MapR Technologies 2013-
Ausers
things
38©MapR Technologies 2013-
A1 A2
é
ë
ù
û
users
thing
type 1
thing
type 2
39©MapR Technologies 2013-
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
A1 A1
T
A2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
40©MapR Technologies 2013-
Now again, without
the scary math
41©MapR Technologies 2013-
Part 3:
Implementation Examples
42©MapR Technologies 2013-
For example
 Users enter queries (A)
– (actor = user, item=query)
 Users view videos (B)
– (actor = user, item=video)
 ATA gives query recommendation
– “did you mean to ask for”
 BTB gives video recommendation
– “you might like these videos”
43©MapR Technologies 2013-
The punch-line
 BTA recommends videos in response to a query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)
44©MapR Technologies 2013-
Real-life example
 Query: “Paco de Lucia”
 Conventional meta-data search results:
– “hombres del paco” times 400
– not much else
 Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff
45©MapR Technologies 2013-
Real-life example
46©MapR Technologies 2013-
Search-based Recommendations
 Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC) id’s
– Indicator offers
– Indicator text
– Local top40
 Sample query
– Current location
– Recent merchant descriptions
– Recent merchant id’s
– Recent SIC codes
– Recent accepted offers
– Local top40
Original data
and meta-data
Derived from cooccurrence
and cross-occurrence
analysis
Recommendation
query
47©MapR Technologies 2013-
Me, Us
 Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
 MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
 Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR

Using Mahout and a Search Engine for Recommendation

Editor's Notes

  • #13 Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email….
  • #18 Note to trainers: the next series of slides start with a cartoon example just to set the pattern of how to find co-occurrence and use it to find indicators of what to recommend. Of course, real examples require a LOT of data of user-item interaction history to actually work, so this is just an analogy to get the idea across…
  • #19 * A history of what everybody has done. Obviously this is just a cartoon because large numbers of users and interactions with items would be required to build a recommender* Next step will be to predict what a new user might like…
  • #20 *Bob is the “new user” and getting apple is his history
  • #21 *Here is where the recommendation engine needs to go to work…Note to trainer: you might see if audience calls out the answer before revealing next slide…
  • #22 Now you see the idea of co-occurrence as a basis for recommendation…
  • #24 *Now we have a new user, Amelia. Like everybody else, she gets a pony… what should the recommender offer her based on her history?
  • #25 * Pony not interesting because it is so widespread that it does not differentiate a pattern
  • #28 Note to trainer: This is the situation similar to that in which we started, with three users in our history. The difference is that now everybody got a pony. Bob has apple and pony but not a puppy…yet
  • #29 *Binary matrix is stored sparsely
  • #30 *Convert by MapReduce into a binary matrixNote to trainer: Whether consider apple to have occurred with self is open question
  • #31 Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
  • #32 Only important co-occurrence is puppy follows apple
  • #33 *Take that row of matrix and combine with all the meta data we might have…*Important thing to get from the co-occurrence matrix is this indicator..Cool thing: analogous to what a lot of recommendation engines do*This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)Find the useful co-occurrence and get rid of the rest. Sparsify and get the anomalous co-occurrence
  • #34 Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
  • #35 *This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence. *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
  • #47 Note to trainer: you could ask the class to consider which data is related… for example, the first 3 bullets of the query relate to meta data for the item, not to data produced by the recommendation algorithm. The last 3 bullets refer to data in the sample query related to data in the indicator field(s) that were produced by the Mahout recommendation engine.