For realvalued matrices, the latent matrix is a diagonal matrix, whose entries are the singular values of the TermDocument matrix.If we order the rows and columns by the size of these singular values, this is the strength of the latent terms. If we only take the top N of these, we end up with a dramatically smaller collection of three matrices.
If you are studying Computer Science Tripos, you’ll cover this in the Part II Information Retrieval Course.
We can do the exact same thing for Collaborative Filtering. In this case:The User matrix is now a mapping from users to the underlying item ‘features’.The Item matrix is now a mapping from items to underlying item features.The Latent matrix specifies how strong each feature is.
So in this formulation, we again suppose that there is a latent feature space of dimensionality f say.Then we are trying to map items i onto an item vector q_iAnd a user u onto a user vector p_uThen the user u’s score for product I is r_ui and equals qiTpu.So we try and find the matrices p and q.The first term is the error squared term and the second is a regularisation parameter that will penalise us if p and q are too complex.
In this example, 6 customers bought products C and D.However, NSCORE(D,C) = 0.26 and NSCORE(C,D) = 0.29This is because D was bought more often, so it means less.
In this example, 6 customers bought products C and D.However, NSCORE(D,C) = 0.26 and NSCORE(C,D) = 0.29This is because D was bought more often, so it means less.
It turns out, if you think about it, that ItemCF ignores a lot of the graph structure.For example, it knows how many customers bought A and B together.It also knows how many customers bought A and C together.But it doesn’t know how many customers bought A and B and C together, or how many customers bought A and B but not C.
We can consider an edge to be: User B bought products 4 and 5.Or we can consider it the other way round: product 5 is bought by B and E.This suggests that 4 and 5 might be similar in some way, and maybe B and E are similar users.
We can consider an edge to be: User B bought products 4 and 5.Or we can consider it the other way round: product 5 is bought by B and E.This suggests that 4 and 5 might be similar in some way, and maybe B and E are similar users.
We can solve the cold start problems by augmenting the graph.Suppose we are trying to find recommendations for a user G who hasn’t bought any products but we know they are similar to users A and B.So we can run random walks starting at A and B instead.
For the cold start product problem, if we know that products 1 and 3 are related (e.g. Same brand), then we can add a dummy node and link 1 and 3 to it.That way, a random walk that reaches product 1 can then reach product 3 via the dummy node.
We don’t have time to go into it now but we are very interested in DrunkardMob.It allows us to do billions of short random walks instead of millions of longer random walks.Why do we care?Because this potentially allows us a general framework to do other types of algorithms too!
Any questions
Transcript
1.
THE WORLD’S
LEADING ONLINE
HEALTH & BEAUTY
DESTINATION
2.
BUSINESS MODEL
300M+ LIFESTYLE CONSUMERS GLOBALLY
PROPRIETARY DATA
TECHNOLOGY PLATFORM
HEALTH
BEAUTY
HIGH REPEAT,
HIGH MARGIN & LOW RETURNS
3.
CUSTOMER METRICS
5
MILLION CUSTOMERS
2.5
MILLION SHIPPED IN
LAST 12 MONTHS
140
MILLION VISITS PER
ANNUM
8
MILLION ORDERS
PER ANNUM
4.
TALENT
25
30
75
280
DATA/TECHNOLOGY
PERSONNEL
AVERAGE AGE OF
EMPLOYEES
AVERAGE AGE OF
DIVISIONAL HEADS
# OF GRADS HIRED
IN LTM
11.
Restaurant Waiter:
•
•
•
•
Special of the day
My personal favourites
Bestsellers
Do you like Chicken?
INTRODUCTION
12.
Clothes Store Attendant:
•
•
•
•
Newin/Seasonal Highlights
Special Offers/Discounts
Bestsellers
Are you looking for anything in
particular?
INTRODUCTION
13.
Quick Taxonomy
Business selfinterest:
Newin, specials,…
General preference:
Bestsellers
More personal:
Search
INTRODUCTION
14.
None of these are personal
except search.
But search requires effort
and a way to articulate what
you need/want.
INTRODUCTION
15.
Solution:
Add in assumptions about
Personal Preference to
create Personalised
Recommendations
INTRODUCTION
16.
Assumption #1
You are like your friends.
INTRODUCTION
17.
Assumption #2
You are like people who do
similar things to you.
INTRODUCTION
18.
Assumption #3
You like things that are similar to things
you already do.
INTRODUCTION
19.
Assumption #4
You are influenced by experts or the
experiences of others.
INTRODUCTION
23.
Problem Statement
An entry is filled with the number of times a user has bought an
item.
It could also be the rating given to an item.
ITEMS
1
A
2
3
5
1
6
1
1
B
USERS
4
C
1
D
1
E
F
1
1
1
PROBLEM STATEMENT
24.
Problem Statement
There are usually too many items and/or too many users.
The matrix is large but very sparse. It’s also incomplete.
ITEMS
1
A
2
3
5
1
6
1
1
B
USERS
4
C
1
D
1
E
F
1
1
1
PROBLEM STATEMENT
25.
Audience Check
Before we get into algorithms, just a quick
check:
• Machine Learning
• Singular Value Decomposition/Latent Semantic
Analysis
• Clustering/Community Detection
• Page Rank and RandomWalks
• ItemBased Collaborative Filtering
ALGORITHMS
26.
Machine Learning : A program whose results
improve with data.
Aim: predict a score for each userproduct so that
we can pick the best products to recommend to a
user.
Data: The previous interactions between users
and products.
ALGORITHMS
36.
Alternating Least Squares
1. Solve for p, using random values
for q.
2. Then solve for q, using latest
version of p.
3. Repeat to solve for p using latest
version of q, etc.
ALGORITHMS
37.
SGD: usually faster.
ALS: can be easily parallelizable.
These methods give up the
Singular, Orthonormal guarantee of
SVD but work with incomplete data.
ALGORITHMS
38.
Itembased Collaborative Filtering
“Amazon.com recommendations: itemtoitem
collaborative filtering”, 2003.
“Customers Who Bought This Also Bought That”
ALGORITHMS
41.
ItemBased CF
A
B
C
D
A

3
10
7
B
3

5
6
C
10
5

6
D
12
5
6

A
B
C
D
A

0.15
0.50
0.35
B
0.21

0.36
0.43
C
0.48
0.24

0.29
D
0.52
0.22
0.26

SCORE
NSCORE
ALGORITHMS
42.
ItemBased CF Algorithm
We can then recommend products to customers as follows:
For each customer:
For each product bought by the customer:
Find the top N recommended products
Take the top M products, ordered by the sum of the scores
This is efficient: O(U + I) Space
1. We store N recommended products per product
Size = (Items * N)
2. We store the products bought by each customer:
Usually this is a low constant
Size = Users * M (M small)
ALGORITHMS
43.
ItemCF collapses the UserProduct matrix into a ProductProduct graph.
But is this the only way to do it?
ALGORITHMS
44.
GraphBased
The userproduct matrix looks like an adjacency matrix for a bipartite
graph, where entries specify edges. In fact, this makes intuitive sense
too.
A
1
B
2
C
3
D
4
E
5
F
6
ALGORITHMS
45.
GraphBased
Maybe if B and E are similar, we should be recommending Product 3 to
B and Product 4 to E. This line of thinking is called Neighbourhood
methods.
A
1
B
2
C
3
D
4
E
5
F
6
ALGORITHMS
46.
GraphBased
We can randomly walk the graph, starting at a User Node and walk an
odd number of steps so we always end up at a Product.
A
1
B
2
C
3
D
4
E
5
F
6
ALGORITHMS
47.
Problems
1. Cold Start (Users): Users may not have bought many products.
2. Cold Start (Products): A Product may not have been bought very
often (or could be new so had no chance to be bought).
This causes a problem for both Latent methods and Neighbourhood
methods.
Possible solutions:
1. Link users to other users using social or demographic data.
2. Link products to other products using taxonomy information like
brand, category, description.
ALGORITHMS
48.
Augmented Graph
Recommending for a new customer G:
A
1
B
2
C
3
D
4
E
5
F
6
G
ALGORITHMS
49.
Augmented Graph
Adding brand connections for 1 and 3.
A
1
B
2
C
3
D
4
E
5
F
6
X
ALGORITHMS
50.
Personalised PageRank
PPR for a given start node S.
If you reach a node u,
Then move to one of the adjacent nodes v with probability (1a), and
back to S with probability a.
If there are N adjacent nodes, then pick one of them according to the
distribution of weights on the edges.
This can’t be done efficiently using the typical Power method for Global
PageRank because it would require N iterations of the Power method.
ALGORITHMS
51.
This can be approximated
efficiently on a single machine
using DrunkardMob.
See “DrunkardMob: billions of random
walks on just a PC”, by Aapo Kyrola.
ALGORITHMS
52.
ItemBased CF can be approximated by a 2step
Random Walk starting from each Product.
We can even do User Clustering or Community
Detection by doing Random Walks starting at
users and taking an even number of steps.
ALGORITHMS
53.
Competition Time!
Can you write the best Recommendation Algorithm?
The Hut Group is offering £5,000 and paid Summer
Internships to the best team (up to 5 people) of University
Students who can make the best recommendations.
Dataset:
• 2.2M rows
• 150,000 Customers
• 500 Products
Rec Challenge 2013
54.
Thanks
Go to www.thehutchallenge.com and enter your
email address today!
wingyung.chan@thehutgroup.com
Twitter: @MrWingChan
Rec Challenge 2013
Views
Actions
Embeds 0
Report content