Practical Machine Learning: Innovations in Recommendation Workshop

© 2014 MapR Technologies 2
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
22 June 2014 Big Data Everywhere Conference #DataIsrael

http://www.wired.com/wiredenterprise/2012/12/mahout/
Recommendation: Widely Used Machine Learning
Example: Open source Apache Mahout used in production

Recommendations
– Data: interactions between people taking action (users) and items
• Data used to train recommendation model
– Goal is to suggest additional interactions
– Example applications: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts

Google maps: restaurant recommendations

Google maps: tech recommendations

Tutorial Part 1:
How recommendation works,
or “I want a pony”…

First question:
Are you using the right data?

Recommendation
Behavior of a crowd
helps us understand
what individuals will do

Recommendations
Alice got an apple and
a puppyAlice

Recommendations
a puppyAlice
Charles got a bicycleCharles

Recommendations
a puppyAlice
Bob Bob got an apple

Recommendations
a puppyAlice
Bob What else would Bob like?

Recommendations
a puppyAlice
Bob A puppy!

You get the idea of how
recommenders can work…

By the way, like me, Bob also
wants a pony…

Recommendations
?
Alice
Bob
Charles
Amelia
What if everybody gets a
pony?
What else would you recommend for
new user Amelia?

Recommendations
?
Alice
Bob
Charles
Amelia
If everybody gets a pony, it’s
not a very good indicator of
what to else predict...

Problems with Raw Co-occurrence
• Very popular items co-occur with everything or why it’s not very
helpful to know that everybody wants a pony…
– Examples: Welcome document; Elevator music
• Very widespread occurrence is not interesting to generate indicators
for recommendation
– Unless you want to offer an item that is constantly desired, such as
razor blades (or ponies)
• What we want is anomalous co-occurrence
– This is the source of interesting indicators of preference on which to
base recommendation

Overview: Get Useful Indicators from Behaviors
1. Use log files to build history matrix of users x items
– Remember: this history of interactions will be sparse compared to all
potential combinations
2. Transform to a co-occurrence matrix of items x items
3. Look for useful indicators by identifying anomalous co-occurrences to
make an indicator matrix
– Log Likelihood Ratio (LLR) can be helpful to judge which co-
occurrences can with confidence be used as indicators of preference
– ItemSimilarityJob in Apache Mahout uses LLR

Apache Mahout: Overview
• Open source Apache project http://mahout.apache.org/
• Mahout version is 0.9 released Feb 2014; inc Scala
– Summary 0.9 blog at http://bit.ly/1rirUUL
• Library of scalable algorithms for machine learning
– Some run on Apache Hadoop distributions; others do not require Hadoop
– Some can be run at small scale
– Some are run in parallel; others are sequential
• Includes the following main areas:
– Clustering & related techniques
– Classification
– Recommendation
– Mahout Math Library

Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice

Log Files
u1
u3
u2
u1
u3
u2
u1
t1
t4
t3
t2
t3
t3
t1

History Matrix: Users x Items
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔

Co-Occurrence Matrix: Items x Items
1 2 0
1
1 1
1
1
0
00
2
How do you tell which co-
occurrences are useful?

1 2 0
1
1 1
1
1
0
00
2
Use LLR test to turn co-
occurrence into
indicators…

Co-occurrence Binary Matrix
1
1not
not
1

Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2

Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2
0.90 1.95
4.52 14.3

1 2 0
1
1 1
1
1
0
00
2
Recap:
Use LLR test to turn co-
occurrence into indicators

Indicator Matrix: Anomalous Co-Occurrence
Result:
Each marked row shows
the indicators of what to
recommend…
✔
✔

Indicator Matrix: Anomalous Co-Occurrence
✔
✔
Why not pony + other item?

How will you deliver
recommendations to users?

Seeking Simplicity
Innovation:
Exploit search technology to deploy
your recommendation system

But first, a look at how
search works…

Apache Solr/Apache Lucene
• Apache Solr/Lucene is an open-source powerful search
engine used for flexible, heavily indexed queries including data
such as
– Full text, geographical data, statistically weighted data
• Lucene
– Provides core retrieval
– Is a low-level library
• Solr
– Is a web-based wrapper around Lucene
– Is easy to integrate because you talk to it via a web-interface
• URL http://host machine:8888

LucidWorks
• Enterprise platform and collection of applications based on
Apache Solr/Lucene
– Wrapper around Solr
– A free version ships with MapR
• LucidWorks leaves Solr exposed but makes Solr
administration much easier, which in turn makes it easier to
use Lucene
• URL http://host machine:8989

Solr
LucidWorks
Query Response
Index
Lucene
Query Response
Data Source
Relationship of Solr /Lucene/LucidWorks

Other Options: Elastic Search
• Apache Lucene library at the heart of several approaches
– It can also be used on its own
• Elastic Search is a different web interface for Lucene
– Real time search and analytics
– Open source (not Apache)
– Big advantage is less accumulated cruft
http://www.elasticsearch.com/

What is a Document?
• Data is stored in collections made up of documents
• Documents contain fields that can be
– Indexed
• makes field searchable; don’t have to index all
– Stored
• If you want Solr to return content, have Solr store content
• Not all data of interest must be stored: can access via stored URL (good for very
large data set)
– Multi-value
• Body field can contain more than one type of data
– Facetted
• A way to refine a search or use for statistics
• Example: data for country: Could return facetted as “37 from US, 23 UK, 7 Japan”

Fields and How to Set Them Up
• Lucene is mostly used for text
– Text has to be tokenized
• Also supports other types
– Long, string, keywords, comma separated
• Fields properties such as stored, indexed, faceted can also be
defined
• Defaults aren’t usually so great

Example of Facetted Search
The indexed fields “Area” and “Gender” have been
facetted to provide counts for the results

Field-specific Searches Using Lucene Syntax
• Documents often have title, author, keywords and body text
• General syntax is
field:(term1 term2)
• Alternatives
field:(term1 OR term2) field:(“term1 term2” ~ 5) field:(term1 AND term2)
• Default field, default interpretations work very well for text

Send Data to Solr (not LucidWorks)
• LucidWorks has lots of spiders that can use file extension or
mime type to trigger certain file parsers
– Works best with web-ish sources and mime types
– Also includes MapR and MapR high volume indexers
• More common at modest volumes or for updates to use JSON
format
{"id":"book_314", "title":{"set":"The Call of the Wild"}}
• Use REST interface to send update files
curl http://localhost:8983/solr/update
-H 'Content-type:application/json' --data-binary @file.json

Back to recommendation:
How do you abuse search to make
recommendation easy?

Collection of Documents: Insert Meta-Data
Search
Technology
Item
meta-data
Document for
“puppy” id: t4
title: puppy
desc: The sweetest little puppy
ever.
keywords: puppy, dog, pet
Ingest easily via NFS

From Indicator Matrix to New Indicator Field
✔
id: t4
title: puppy
desc: The sweetest little puppy
ever.
keywords: puppy, dog, pet
indicators: (t1)
Solr document
for “puppy”
Note: data for the indicator field is added directly to meta-data for a document in Apache
Solr or Elastic Search index. You don’t need to create a separate index for the indicators.

Let’s look at a real example:
we built a music recommender

User activity: Listens to classic jazz hit “Take the A Train”

System delivers recommendations based on activity

Let’s look inside the music
recommender…

Music Meta Data for Search Document Collections
• MusicBrainz data
• Data includes Artist ID, MusicBrainz ID, Name, Group/Person, From (geo
locations) and Gender as seen in this sample

Sample User Behavior Histories: Music Log Files
13 START 10113 2182654281
23 BEACON 10113 2182654281
24 START 10113 79600611935028
34 BEACON 10113 79600611935028
44 BEACON 10113 79600611935028
54 BEACON 10113 79600611935028
64 BEACON 10113 79600611935028
74 BEACON 10113 79600611935028
84 BEACON 10113 79600611935028
94 BEACON 10113 79600611935028
104 BEACON 10113 79600611935028
109 FINISH10113 79600611935028
111 START 10113 58999912011972
121 BEACON 10113 58999912011972
Time
Event type
User ID
Artist ID
Track ID

Sample Music Log Files
Artist ID for jazz
musician Duke Ellington
What has user 119
done here in the
highlighted lines?

Internals of a Recommendation Engine

id 1710
mbid 592a3b6d-c42b-4567-99c9-ecf63bd66499
name Chuck Berry
area United States
gender Male
indicator_artists 386685,875994,637954,3418,1344,789739,1460, …
id 541902
mbid 983d4f8f-473e-4091-8394-415c105c4656
name Charlie Winston
area United Kingdom
gender None
indicator_artists 997727,815,830794,59588,900,2591,1344,696268, …
Lucene Documents for Music Recommendation
Notice that data from indicator matrix of trained Mahout recommender
model has been added to indicator field in documents of the artists
collection

Offline Analysis
Analysis Using
Mahout
Users History
Log Files Indicators
Search
Technology
Item
Meta-Data

Log Files
Mahout
Analysis
Search
Technology
Item
Meta-Data
Ingest easily via NFS
MapR Cluster
via NFS Python
Use Python
directly via NFS
Pig
Web
TierRecommendations
New User History
Real-time recommendations using MapR data platform

A Quick Simplification
• Users who do h
• Also do
Ah
AT
Ah( )
AT
A( )h
User-centric recommendations
Item-centric recommendations

Architectural Advantage
AT
Ah( )
AT
A( )h
User-centric recommendations
Item-centric recommendations

Architectural Advantage
AT
Ah( ) User-centric recommendations
With the first design, you have to do the real-time computation first (in
parenthesis). No way to pre-compute. Less efficient, less fast.
With the second design, you can pre-compute offline (overnight)
things that change slowly. Only the smaller computation for new user
vector (h) is done in real-time, so response is very fast.
AT
A( )h Item-centric recommendations

Tutorial Part 2:
How to make recommendation better

Going Further: Multi-Modal Recommendation

For example
• Users enter queries (A)
– (actor = user, item=query)
• Users view videos (B)
– (actor = user, item=video)
• ATA gives query recommendation
– “did you mean to ask for”
• BTB gives video recommendation
– “you might like these videos”

The punch-line
• BTA recommends videos in response to a query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)

Real-life example
• Query: “Paco de Lucia”
• Conventional meta-data search results:
– “hombres de paco” times 400
– not much else
• Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff

Real-life example

Hypothetical Example
• Want a navigational ontology?
• Just put labels on a web page with traffic
– This gives A = users x label clicks
• Remember viewing history
– This gives B = users x items
• Cross recommend
– B’A = label to item mapping
• After several users click, results are whatever users think they
should be

Nice. But we can
do better?

Symmetry Gives Cross Recommentations
AT
A( )h
BT
A( )h
Conventional recommendations with
off-line learning
Cross recommendations

Ausers
things

A1 A2
é
ë
ù
û
users
thing
type 1
thing
type 2

A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
A1 A1
T
A2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú

Bonus Round:
When worse is better

The Real Issues After First Production
• Exploration
• Diversity
• Speed
• Not the last fraction of a percent

Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual performance
much better

Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual performance
much better
“Made more difference than any other change”

Why Dithering Works
Real-time
recommender
Overnight
training
Log Files

Why Use Dithering?

Simple Dithering Algorithm
• Synthetic score from log rank plus Gaussian
• Pick noise scale to provide desired level of mixing
• Typically
• Also… use floor(t/T) as seed
s = logr + N(0,loge)
Dr
r
µe
e Î 1.5,3[ ]

Example … ε = 2
1 2 8 3 9 15 7 6
1 8 14 15 3 2 22 10
1 3 8 2 10 5 7 4
1 2 10 7 3 8 6 14
1 5 33 15 2 9 11 29
1 2 7 3 5 4 19 6
1 3 5 23 9 7 4 2
2 4 11 8 3 1 44 9
2 3 1 4 6 7 8 33
3 4 1 2 10 11 15 14
11 1 2 4 5 7 3 14
1 8 7 3 22 11 2 33

Lesson:
Exploration is good

Thank you
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
22 June 2014 Big Data Everywhere Conference #DataIsrael

Practical Machine Learning: Innovations in Recommendation Workshop

More Related Content

What's hot

Viewers also liked

Similar to Practical Machine Learning: Innovations in Recommendation Workshop

More from MapR Technologies

Recently uploaded

Practical Machine Learning: Innovations in Recommendation Workshop

Editor's Notes