• Like
  • Save
Personalised Recommendations in E-Commerce
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Personalised Recommendations in E-Commerce

  • 1,129 views
Published

 

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,129
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • This is also called the cosine metric.
  • We treat the terms as orthogonal.
  • For real-valued matrices, the latent matrix is a diagonal matrix, whose entries are the singular values of the Term-Document matrix.If we order the rows and columns by the size of these singular values, this is the strength of the latent terms. If we only take the top N of these, we end up with a dramatically smaller collection of three matrices.
  • If you are studying Computer Science Tripos, you’ll cover this in the Part II Information Retrieval Course.
  • We can do the exact same thing for Collaborative Filtering. In this case:The User matrix is now a mapping from users to the underlying item ‘features’.The Item matrix is now a mapping from items to underlying item features.The Latent matrix specifies how strong each feature is.
  • So in this formulation, we again suppose that there is a latent feature space of dimensionality f say.Then we are trying to map items i onto an item vector q_iAnd a user u onto a user vector p_uThen the user u’s score for product I is r_ui and equals qiTpu.So we try and find the matrices p and q.The first term is the error squared term and the second is a regularisation parameter that will penalise us if p and q are too complex.
  • In this example, 6 customers bought products C and D.However, NSCORE(D,C) = 0.26 and NSCORE(C,D) = 0.29This is because D was bought more often, so it means less.
  • In this example, 6 customers bought products C and D.However, NSCORE(D,C) = 0.26 and NSCORE(C,D) = 0.29This is because D was bought more often, so it means less.
  • It turns out, if you think about it, that Item-CF ignores a lot of the graph structure.For example, it knows how many customers bought A and B together.It also knows how many customers bought A and C together.But it doesn’t know how many customers bought A and B and C together, or how many customers bought A and B but not C.
  • We can consider an edge to be: User B bought products 4 and 5.Or we can consider it the other way round: product 5 is bought by B and E.This suggests that 4 and 5 might be similar in some way, and maybe B and E are similar users.
  • We can consider an edge to be: User B bought products 4 and 5.Or we can consider it the other way round: product 5 is bought by B and E.This suggests that 4 and 5 might be similar in some way, and maybe B and E are similar users.
  • We can solve the cold start problems by augmenting the graph.Suppose we are trying to find recommendations for a user G who hasn’t bought any products but we know they are similar to users A and B.So we can run random walks starting at A and B instead.
  • For the cold start product problem, if we know that products 1 and 3 are related (e.g. Same brand), then we can add a dummy node and link 1 and 3 to it.That way, a random walk that reaches product 1 can then reach product 3 via the dummy node.
  • We don’t have time to go into it now but we are very interested in DrunkardMob.It allows us to do billions of short random walks instead of millions of longer random walks.Why do we care?Because this potentially allows us a general framework to do other types of algorithms too!
  • Any questions

Transcript

  • 1. THE WORLD’S LEADING ONLINE HEALTH & BEAUTY DESTINATION
  • 2. BUSINESS MODEL 300M+ LIFESTYLE CONSUMERS GLOBALLY PROPRIETARY DATA TECHNOLOGY PLATFORM HEALTH BEAUTY HIGH REPEAT, HIGH MARGIN & LOW RETURNS
  • 3. CUSTOMER METRICS 5 MILLION CUSTOMERS 2.5 MILLION SHIPPED IN LAST 12 MONTHS 140 MILLION VISITS PER ANNUM 8 MILLION ORDERS PER ANNUM
  • 4. TALENT 25 30 75 280 DATA/TECHNOLOGY PERSONNEL AVERAGE AGE OF EMPLOYEES AVERAGE AGE OF DIVISIONAL HEADS # OF GRADS HIRED IN LTM
  • 5. SCALABILITY: VISITS ORDERS AVERAGE ORDER VALUE 400 22 50 42 46 2009 21 8 140 2 2013E 2017E Visits (Millions) 2009 2013E 2017E Orders (Millions) 2009 2013E AOV (£) 2017E
  • 6. SCALABILITY: HOW INVESTMENT IN TALENT INVESTMENT IN CAPACITY 22 350 £40M INVESTMENT 155 8 2 33 2009 2013E 2017E Tech Headcount 2009 2013E 2017E Orders (Millions)
  • 7. Personalised Recommendations for E-Commerce Wing Yung Chan Applications of Computing in Industry
  • 8. Recommendation = “Suggestion to do something” INTRODUCTION
  • 9. Watch a film Read a book Buy a drink Visit a museum INTRODUCTION
  • 10. Types of recommendations… INTRODUCTION
  • 11. Restaurant Waiter: • • • • Special of the day My personal favourites Bestsellers Do you like Chicken? INTRODUCTION
  • 12. Clothes Store Attendant: • • • • New-in/Seasonal Highlights Special Offers/Discounts Bestsellers Are you looking for anything in particular? INTRODUCTION
  • 13. Quick Taxonomy Business self-interest: New-in, specials,… General preference: Bestsellers More personal: Search INTRODUCTION
  • 14. None of these are personal except search. But search requires effort and a way to articulate what you need/want. INTRODUCTION
  • 15. Solution: Add in assumptions about Personal Preference to create Personalised Recommendations INTRODUCTION
  • 16. Assumption #1 You are like your friends. INTRODUCTION
  • 17. Assumption #2 You are like people who do similar things to you. INTRODUCTION
  • 18. Assumption #3 You like things that are similar to things you already do. INTRODUCTION
  • 19. Assumption #4 You are influenced by experts or the experiences of others. INTRODUCTION
  • 20. INTRODUCTION
  • 21. How can we program this? INTRODUCTION
  • 22. INTRODUCTION
  • 23. Problem Statement An entry is filled with the number of times a user has bought an item. It could also be the rating given to an item. ITEMS 1 A 2 3 5 1 6 1 1 B USERS 4 C 1 D 1 E F 1 1 1 PROBLEM STATEMENT
  • 24. Problem Statement There are usually too many items and/or too many users. The matrix is large but very sparse. It’s also incomplete. ITEMS 1 A 2 3 5 1 6 1 1 B USERS 4 C 1 D 1 E F 1 1 1 PROBLEM STATEMENT
  • 25. Audience Check Before we get into algorithms, just a quick check: • Machine Learning • Singular Value Decomposition/Latent Semantic Analysis • Clustering/Community Detection • Page Rank and Random-Walks • Item-Based Collaborative Filtering ALGORITHMS
  • 26. Machine Learning : A program whose results improve with data. Aim: predict a score for each user-product so that we can pick the best products to recommend to a user. Data: The previous interactions between users and products. ALGORITHMS
  • 27. ALGORITHMS
  • 28. But we need to find products for users. So we need users and products to be mapped to the same space. ALGORITHMS
  • 29. Singular Value Decomposition documents latent documents M WxP = M terms terms M W P M ALGORITHMS
  • 30. Now we just replace terms with users and documents with products. ALGORITHMS
  • 31. Singular Value Decomposition products latent products M WxP = M users users M W P M ALGORITHMS
  • 32. ALGORITHMS
  • 33. However… Traditional SVD methods fail when the matrix is incomplete. ALGORITHMS
  • 34. ALGORITHMS
  • 35. ALGORITHMS
  • 36. Alternating Least Squares 1. Solve for p, using random values for q. 2. Then solve for q, using latest version of p. 3. Repeat to solve for p using latest version of q, etc. ALGORITHMS
  • 37. SGD: usually faster. ALS: can be easily parallelizable. These methods give up the Singular, Orthonormal guarantee of SVD but work with incomplete data. ALGORITHMS
  • 38. Item-based Collaborative Filtering “Amazon.com recommendations: item-to-item collaborative filtering”, 2003. “Customers Who Bought This Also Bought That” ALGORITHMS
  • 39. Item-Based CF 1000 2 100 ALGORITHMS
  • 40. Item-Based CF ALGORITHMS
  • 41. Item-Based CF A B C D A - 3 10 7 B 3 - 5 6 C 10 5 - 6 D 12 5 6 - A B C D A - 0.15 0.50 0.35 B 0.21 - 0.36 0.43 C 0.48 0.24 - 0.29 D 0.52 0.22 0.26 - SCORE NSCORE ALGORITHMS
  • 42. Item-Based CF Algorithm We can then recommend products to customers as follows: For each customer: For each product bought by the customer: Find the top N recommended products Take the top M products, ordered by the sum of the scores This is efficient: O(U + I) Space 1. We store N recommended products per product Size = (Items * N) 2. We store the products bought by each customer: Usually this is a low constant Size = Users * M (M small) ALGORITHMS
  • 43. Item-CF collapses the UserProduct matrix into a ProductProduct graph. But is this the only way to do it? ALGORITHMS
  • 44. Graph-Based The user-product matrix looks like an adjacency matrix for a bipartite graph, where entries specify edges. In fact, this makes intuitive sense too. A 1 B 2 C 3 D 4 E 5 F 6 ALGORITHMS
  • 45. Graph-Based Maybe if B and E are similar, we should be recommending Product 3 to B and Product 4 to E. This line of thinking is called Neighbourhood methods. A 1 B 2 C 3 D 4 E 5 F 6 ALGORITHMS
  • 46. Graph-Based We can randomly walk the graph, starting at a User Node and walk an odd number of steps so we always end up at a Product. A 1 B 2 C 3 D 4 E 5 F 6 ALGORITHMS
  • 47. Problems 1. Cold Start (Users): Users may not have bought many products. 2. Cold Start (Products): A Product may not have been bought very often (or could be new so had no chance to be bought). This causes a problem for both Latent methods and Neighbourhood methods. Possible solutions: 1. Link users to other users using social or demographic data. 2. Link products to other products using taxonomy information like brand, category, description. ALGORITHMS
  • 48. Augmented Graph Recommending for a new customer G: A 1 B 2 C 3 D 4 E 5 F 6 G ALGORITHMS
  • 49. Augmented Graph Adding brand connections for 1 and 3. A 1 B 2 C 3 D 4 E 5 F 6 X ALGORITHMS
  • 50. Personalised PageRank PPR for a given start node S. If you reach a node u, Then move to one of the adjacent nodes v with probability (1-a), and back to S with probability a. If there are N adjacent nodes, then pick one of them according to the distribution of weights on the edges. This can’t be done efficiently using the typical Power method for Global PageRank because it would require N iterations of the Power method. ALGORITHMS
  • 51. This can be approximated efficiently on a single machine using DrunkardMob. See “DrunkardMob: billions of random walks on just a PC”, by Aapo Kyrola. ALGORITHMS
  • 52. Item-Based CF can be approximated by a 2-step Random Walk starting from each Product. We can even do User Clustering or Community Detection by doing Random Walks starting at users and taking an even number of steps. ALGORITHMS
  • 53. Competition Time! Can you write the best Recommendation Algorithm? The Hut Group is offering £5,000 and paid Summer Internships to the best team (up to 5 people) of University Students who can make the best recommendations. Dataset: • 2.2M rows • 150,000 Customers • 500 Products Rec Challenge 2013
  • 54. Thanks Go to www.thehutchallenge.com and enter your e-mail address today! wingyung.chan@thehutgroup.com Twitter: @MrWingChan Rec Challenge 2013