THE WORLD’S
LEADING ONLINE
HEALTH & BEAUTY
DESTINATION
BUSINESS MODEL
300M+ LIFESTYLE CONSUMERS GLOBALLY

PROPRIETARY DATA
TECHNOLOGY PLATFORM

HEALTH

BEAUTY

HIGH REPEAT,
HIGH...
CUSTOMER METRICS

5

MILLION CUSTOMERS

2.5

MILLION SHIPPED IN
LAST 12 MONTHS

140

MILLION VISITS PER
ANNUM

8

MILLION ...
TALENT

25
30
75
280

DATA/TECHNOLOGY
PERSONNEL

AVERAGE AGE OF
EMPLOYEES

AVERAGE AGE OF
DIVISIONAL HEADS

# OF GRADS HIR...
SCALABILITY:
VISITS

ORDERS

AVERAGE ORDER
VALUE

400

22
50

42

46

2009

21

8

140

2

2013E

2017E

Visits (Millions)...
SCALABILITY: HOW
INVESTMENT IN
TALENT

INVESTMENT IN
CAPACITY
22

350

£40M
INVESTMENT
155

8

2

33

2009

2013E

2017E

...
Personalised
Recommendations for
E-Commerce
Wing Yung Chan

Applications of Computing
in Industry
Recommendation
=
“Suggestion to do
something”
INTRODUCTION
Watch a film
Read a book
Buy a drink
Visit a museum
INTRODUCTION
Types of
recommendations…

INTRODUCTION
Restaurant Waiter:
•
•
•
•

Special of the day
My personal favourites
Bestsellers
Do you like Chicken?
INTRODUCTION
Clothes Store Attendant:
•
•
•
•

New-in/Seasonal Highlights
Special Offers/Discounts
Bestsellers
Are you looking for anyt...
Quick Taxonomy
Business self-interest:
New-in, specials,…
General preference:
Bestsellers
More personal:
Search
INTRODUCTI...
None of these are personal
except search.

But search requires effort
and a way to articulate what
you need/want.
INTRODUC...
Solution:
Add in assumptions about
Personal Preference to
create Personalised
Recommendations
INTRODUCTION
Assumption #1
You are like your friends.

INTRODUCTION
Assumption #2
You are like people who do
similar things to you.

INTRODUCTION
Assumption #3
You like things that are similar to things
you already do.

INTRODUCTION
Assumption #4
You are influenced by experts or the
experiences of others.

INTRODUCTION
INTRODUCTION
How can we program this?

INTRODUCTION
INTRODUCTION
Problem Statement
An entry is filled with the number of times a user has bought an
item.
It could also be the rating given...
Problem Statement
There are usually too many items and/or too many users.
The matrix is large but very sparse. It’s also i...
Audience Check
Before we get into algorithms, just a quick
check:
• Machine Learning
• Singular Value Decomposition/Latent...
Machine Learning : A program whose results
improve with data.
Aim: predict a score for each user-product so that
we can pi...
ALGORITHMS
But we need to find products
for users. So we need users
and products to be mapped
to the same space.

ALGORITHMS
Singular Value Decomposition
documents

latent

documents
M

WxP

=

M
terms

terms

M
W

P

M

ALGORITHMS
Now we just replace terms
with users and documents
with products.

ALGORITHMS
Singular Value Decomposition
products

latent

products
M

WxP

=

M

users

users

M
W

P

M

ALGORITHMS
ALGORITHMS
However…
Traditional SVD methods fail when the
matrix is incomplete.

ALGORITHMS
ALGORITHMS
ALGORITHMS
Alternating Least Squares
1. Solve for p, using random values
for q.
2. Then solve for q, using latest
version of p.
3. Re...
SGD: usually faster.

ALS: can be easily parallelizable.
These methods give up the
Singular, Orthonormal guarantee of
SVD ...
Item-based Collaborative Filtering
“Amazon.com recommendations: item-to-item
collaborative filtering”, 2003.
“Customers Wh...
Item-Based CF
1000

2

100

ALGORITHMS
Item-Based CF

ALGORITHMS
Item-Based CF
A

B

C

D

A

-

3

10

7

B

3

-

5

6

C

10

5

-

6

D

12

5

6

-

A

B

C

D

A

-

0.15

0.50

0.3...
Item-Based CF Algorithm
We can then recommend products to customers as follows:
For each customer:
For each product bought...
Item-CF collapses the UserProduct matrix into a ProductProduct graph.
But is this the only way to do it?

ALGORITHMS
Graph-Based
The user-product matrix looks like an adjacency matrix for a bipartite
graph, where entries specify edges. In ...
Graph-Based
Maybe if B and E are similar, we should be recommending Product 3 to
B and Product 4 to E. This line of thinki...
Graph-Based
We can randomly walk the graph, starting at a User Node and walk an
odd number of steps so we always end up at...
Problems
1. Cold Start (Users): Users may not have bought many products.
2. Cold Start (Products): A Product may not have ...
Augmented Graph
Recommending for a new customer G:
A

1

B

2

C

3

D

4

E

5

F

6

G

ALGORITHMS
Augmented Graph
Adding brand connections for 1 and 3.
A

1

B

2

C

3

D

4

E

5

F

6

X

ALGORITHMS
Personalised PageRank
PPR for a given start node S.
If you reach a node u,
Then move to one of the adjacent nodes v with p...
This can be approximated
efficiently on a single machine
using DrunkardMob.
See “DrunkardMob: billions of random
walks on ...
Item-Based CF can be approximated by a 2-step
Random Walk starting from each Product.
We can even do User Clustering or Co...
Competition Time!
Can you write the best Recommendation Algorithm?
The Hut Group is offering £5,000 and paid Summer
Intern...
Thanks

Go to www.thehutchallenge.com and enter your
e-mail address today!

wingyung.chan@thehutgroup.com
Twitter: @MrWing...
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
Upcoming SlideShare
Loading in …5
×

Personalised Recommendations in E-Commerce

1,720 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,720
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This is also called the cosine metric.
  • We treat the terms as orthogonal.
  • For real-valued matrices, the latent matrix is a diagonal matrix, whose entries are the singular values of the Term-Document matrix.If we order the rows and columns by the size of these singular values, this is the strength of the latent terms. If we only take the top N of these, we end up with a dramatically smaller collection of three matrices.
  • If you are studying Computer Science Tripos, you’ll cover this in the Part II Information Retrieval Course.
  • We can do the exact same thing for Collaborative Filtering. In this case:The User matrix is now a mapping from users to the underlying item ‘features’.The Item matrix is now a mapping from items to underlying item features.The Latent matrix specifies how strong each feature is.
  • So in this formulation, we again suppose that there is a latent feature space of dimensionality f say.Then we are trying to map items i onto an item vector q_iAnd a user u onto a user vector p_uThen the user u’s score for product I is r_ui and equals qiTpu.So we try and find the matrices p and q.The first term is the error squared term and the second is a regularisation parameter that will penalise us if p and q are too complex.
  • In this example, 6 customers bought products C and D.However, NSCORE(D,C) = 0.26 and NSCORE(C,D) = 0.29This is because D was bought more often, so it means less.
  • In this example, 6 customers bought products C and D.However, NSCORE(D,C) = 0.26 and NSCORE(C,D) = 0.29This is because D was bought more often, so it means less.
  • It turns out, if you think about it, that Item-CF ignores a lot of the graph structure.For example, it knows how many customers bought A and B together.It also knows how many customers bought A and C together.But it doesn’t know how many customers bought A and B and C together, or how many customers bought A and B but not C.
  • We can consider an edge to be: User B bought products 4 and 5.Or we can consider it the other way round: product 5 is bought by B and E.This suggests that 4 and 5 might be similar in some way, and maybe B and E are similar users.
  • We can consider an edge to be: User B bought products 4 and 5.Or we can consider it the other way round: product 5 is bought by B and E.This suggests that 4 and 5 might be similar in some way, and maybe B and E are similar users.
  • We can solve the cold start problems by augmenting the graph.Suppose we are trying to find recommendations for a user G who hasn’t bought any products but we know they are similar to users A and B.So we can run random walks starting at A and B instead.
  • For the cold start product problem, if we know that products 1 and 3 are related (e.g. Same brand), then we can add a dummy node and link 1 and 3 to it.That way, a random walk that reaches product 1 can then reach product 3 via the dummy node.
  • We don’t have time to go into it now but we are very interested in DrunkardMob.It allows us to do billions of short random walks instead of millions of longer random walks.Why do we care?Because this potentially allows us a general framework to do other types of algorithms too!
  • Any questions
  • Personalised Recommendations in E-Commerce

    1. 1. THE WORLD’S LEADING ONLINE HEALTH & BEAUTY DESTINATION
    2. 2. BUSINESS MODEL 300M+ LIFESTYLE CONSUMERS GLOBALLY PROPRIETARY DATA TECHNOLOGY PLATFORM HEALTH BEAUTY HIGH REPEAT, HIGH MARGIN & LOW RETURNS
    3. 3. CUSTOMER METRICS 5 MILLION CUSTOMERS 2.5 MILLION SHIPPED IN LAST 12 MONTHS 140 MILLION VISITS PER ANNUM 8 MILLION ORDERS PER ANNUM
    4. 4. TALENT 25 30 75 280 DATA/TECHNOLOGY PERSONNEL AVERAGE AGE OF EMPLOYEES AVERAGE AGE OF DIVISIONAL HEADS # OF GRADS HIRED IN LTM
    5. 5. SCALABILITY: VISITS ORDERS AVERAGE ORDER VALUE 400 22 50 42 46 2009 21 8 140 2 2013E 2017E Visits (Millions) 2009 2013E 2017E Orders (Millions) 2009 2013E AOV (£) 2017E
    6. 6. SCALABILITY: HOW INVESTMENT IN TALENT INVESTMENT IN CAPACITY 22 350 £40M INVESTMENT 155 8 2 33 2009 2013E 2017E Tech Headcount 2009 2013E 2017E Orders (Millions)
    7. 7. Personalised Recommendations for E-Commerce Wing Yung Chan Applications of Computing in Industry
    8. 8. Recommendation = “Suggestion to do something” INTRODUCTION
    9. 9. Watch a film Read a book Buy a drink Visit a museum INTRODUCTION
    10. 10. Types of recommendations… INTRODUCTION
    11. 11. Restaurant Waiter: • • • • Special of the day My personal favourites Bestsellers Do you like Chicken? INTRODUCTION
    12. 12. Clothes Store Attendant: • • • • New-in/Seasonal Highlights Special Offers/Discounts Bestsellers Are you looking for anything in particular? INTRODUCTION
    13. 13. Quick Taxonomy Business self-interest: New-in, specials,… General preference: Bestsellers More personal: Search INTRODUCTION
    14. 14. None of these are personal except search. But search requires effort and a way to articulate what you need/want. INTRODUCTION
    15. 15. Solution: Add in assumptions about Personal Preference to create Personalised Recommendations INTRODUCTION
    16. 16. Assumption #1 You are like your friends. INTRODUCTION
    17. 17. Assumption #2 You are like people who do similar things to you. INTRODUCTION
    18. 18. Assumption #3 You like things that are similar to things you already do. INTRODUCTION
    19. 19. Assumption #4 You are influenced by experts or the experiences of others. INTRODUCTION
    20. 20. INTRODUCTION
    21. 21. How can we program this? INTRODUCTION
    22. 22. INTRODUCTION
    23. 23. Problem Statement An entry is filled with the number of times a user has bought an item. It could also be the rating given to an item. ITEMS 1 A 2 3 5 1 6 1 1 B USERS 4 C 1 D 1 E F 1 1 1 PROBLEM STATEMENT
    24. 24. Problem Statement There are usually too many items and/or too many users. The matrix is large but very sparse. It’s also incomplete. ITEMS 1 A 2 3 5 1 6 1 1 B USERS 4 C 1 D 1 E F 1 1 1 PROBLEM STATEMENT
    25. 25. Audience Check Before we get into algorithms, just a quick check: • Machine Learning • Singular Value Decomposition/Latent Semantic Analysis • Clustering/Community Detection • Page Rank and Random-Walks • Item-Based Collaborative Filtering ALGORITHMS
    26. 26. Machine Learning : A program whose results improve with data. Aim: predict a score for each user-product so that we can pick the best products to recommend to a user. Data: The previous interactions between users and products. ALGORITHMS
    27. 27. ALGORITHMS
    28. 28. But we need to find products for users. So we need users and products to be mapped to the same space. ALGORITHMS
    29. 29. Singular Value Decomposition documents latent documents M WxP = M terms terms M W P M ALGORITHMS
    30. 30. Now we just replace terms with users and documents with products. ALGORITHMS
    31. 31. Singular Value Decomposition products latent products M WxP = M users users M W P M ALGORITHMS
    32. 32. ALGORITHMS
    33. 33. However… Traditional SVD methods fail when the matrix is incomplete. ALGORITHMS
    34. 34. ALGORITHMS
    35. 35. ALGORITHMS
    36. 36. Alternating Least Squares 1. Solve for p, using random values for q. 2. Then solve for q, using latest version of p. 3. Repeat to solve for p using latest version of q, etc. ALGORITHMS
    37. 37. SGD: usually faster. ALS: can be easily parallelizable. These methods give up the Singular, Orthonormal guarantee of SVD but work with incomplete data. ALGORITHMS
    38. 38. Item-based Collaborative Filtering “Amazon.com recommendations: item-to-item collaborative filtering”, 2003. “Customers Who Bought This Also Bought That” ALGORITHMS
    39. 39. Item-Based CF 1000 2 100 ALGORITHMS
    40. 40. Item-Based CF ALGORITHMS
    41. 41. Item-Based CF A B C D A - 3 10 7 B 3 - 5 6 C 10 5 - 6 D 12 5 6 - A B C D A - 0.15 0.50 0.35 B 0.21 - 0.36 0.43 C 0.48 0.24 - 0.29 D 0.52 0.22 0.26 - SCORE NSCORE ALGORITHMS
    42. 42. Item-Based CF Algorithm We can then recommend products to customers as follows: For each customer: For each product bought by the customer: Find the top N recommended products Take the top M products, ordered by the sum of the scores This is efficient: O(U + I) Space 1. We store N recommended products per product Size = (Items * N) 2. We store the products bought by each customer: Usually this is a low constant Size = Users * M (M small) ALGORITHMS
    43. 43. Item-CF collapses the UserProduct matrix into a ProductProduct graph. But is this the only way to do it? ALGORITHMS
    44. 44. Graph-Based The user-product matrix looks like an adjacency matrix for a bipartite graph, where entries specify edges. In fact, this makes intuitive sense too. A 1 B 2 C 3 D 4 E 5 F 6 ALGORITHMS
    45. 45. Graph-Based Maybe if B and E are similar, we should be recommending Product 3 to B and Product 4 to E. This line of thinking is called Neighbourhood methods. A 1 B 2 C 3 D 4 E 5 F 6 ALGORITHMS
    46. 46. Graph-Based We can randomly walk the graph, starting at a User Node and walk an odd number of steps so we always end up at a Product. A 1 B 2 C 3 D 4 E 5 F 6 ALGORITHMS
    47. 47. Problems 1. Cold Start (Users): Users may not have bought many products. 2. Cold Start (Products): A Product may not have been bought very often (or could be new so had no chance to be bought). This causes a problem for both Latent methods and Neighbourhood methods. Possible solutions: 1. Link users to other users using social or demographic data. 2. Link products to other products using taxonomy information like brand, category, description. ALGORITHMS
    48. 48. Augmented Graph Recommending for a new customer G: A 1 B 2 C 3 D 4 E 5 F 6 G ALGORITHMS
    49. 49. Augmented Graph Adding brand connections for 1 and 3. A 1 B 2 C 3 D 4 E 5 F 6 X ALGORITHMS
    50. 50. Personalised PageRank PPR for a given start node S. If you reach a node u, Then move to one of the adjacent nodes v with probability (1-a), and back to S with probability a. If there are N adjacent nodes, then pick one of them according to the distribution of weights on the edges. This can’t be done efficiently using the typical Power method for Global PageRank because it would require N iterations of the Power method. ALGORITHMS
    51. 51. This can be approximated efficiently on a single machine using DrunkardMob. See “DrunkardMob: billions of random walks on just a PC”, by Aapo Kyrola. ALGORITHMS
    52. 52. Item-Based CF can be approximated by a 2-step Random Walk starting from each Product. We can even do User Clustering or Community Detection by doing Random Walks starting at users and taking an even number of steps. ALGORITHMS
    53. 53. Competition Time! Can you write the best Recommendation Algorithm? The Hut Group is offering £5,000 and paid Summer Internships to the best team (up to 5 people) of University Students who can make the best recommendations. Dataset: • 2.2M rows • 150,000 Customers • 500 Products Rec Challenge 2013
    54. 54. Thanks Go to www.thehutchallenge.com and enter your e-mail address today! wingyung.chan@thehutgroup.com Twitter: @MrWingChan Rec Challenge 2013

    ×