Graph Based Recommendation Systems at eBay

21,962 views
21,108 views

Published on

Recommendation and personalization systems are an important part of many modern websites. Graphs provide a natural way to represent the behavioral data that is the core input to many recommendation algorithms. Thomas Pinckney and his colleagues at Hunch (recently acquired by eBay) built a large scale recommendation system, and then ported the technology to eBay. Thomas will be discussing how his team uses Cassandra to provide the high I/O storage of their fifty billion edge graphs and how they generate new recommendations in real time as users click around the site.

1 Comment
31 Likes
Statistics
Notes
No Downloads
Views
Total views
21,962
On SlideShare
0
From Embeds
0
Number of Embeds
7,566
Actions
Shares
0
Downloads
344
Comments
1
Likes
31
Embeds 0
No embeds

No notes for slide

Graph Based Recommendation Systems at eBay

  1. 1. Modeling taste with CassandraAffinity is based on user tastes, preferences, and interests 1
  2. 2. What is a taste profile? Operational definition: the set of things you like and dislikeStuff I like Stuff I don’t like Challenge: how do you build a set of things you like and dislike Operational definition: the taste profile for someone? 2
  3. 3. Thesis: Likes are correlated
  4. 4. Inferring correlations D 1) User A: • Democrat • Likes Arugula 2) User B: C • Republican E ? • Dislikes Arugula 3) User C indicates: • Democrat What would we infer is User C’s affinity for Arugula?A Answer: User C would like Arugula B 4
  5. 5. Inferring correlations Like arugula User A <3, 2.5> <1,1>Dislike LikeObama Obama User B <-2,-1.5> <-3,-3> Dislike arugula User C If someone’s affinity for Obama is 2.0, <2,?> what is their affinity for arugula? 5
  6. 6. Discovering latent factors Obama Liberal Arugula <5, 5> Like arugula <4, 4> User A <3, 2> <1,1> Dislike Like Obama Obama User B <-2,-1.5> Iceberg <-3,-3> <-4, -4> Dislike arugula GOP <-5, -5> User C Predict 1.5 for how much this person will <2,1.5>Conservative like arugula. 6
  7. 7. Taste space = many latent factors <0.7, 4.4, -.1> Liberal <0.5, 2.4, -.4> A ExtrovertedMasculine Feminine <-0.5, -3.1, 0.1> Introverted B Conservative 7
  8. 8. What is a taste profile profile? Operational definition: a coordinate in taste spaceStuff I like (close to me in taste space) Stuff I don’t like (far away in taste space) Operational definition: the set of things you like and dislike Challenge: how do you calculate taste coordinates? 8
  9. 9. Calculating taste coordinates D Edge weight = dot product of nodes? <x, y> to constrain similar items to be 2 <1, -1> close to each other. C Assume edge weights of: E +2 = “love” -2 = “hate” 2 <1, -0.5> Democratic node must solve: 1*x -2*y = 2 (edge from A) 2 -2 1*x -1*y = 2 (edge from C) A Solution = <2, 0> <1, -2> B <-1, 2> 9
  10. 10. Updating taste coordinates User A purchases a camera...<1, -1> <1, -0.5> 2 <1, -1> 2 <1, -1> C C <-1, 0.5> <-1, 0.5> <1, -0.5> 2 <1, -0.5> 2 -2 2 2 -2 A 2 A <1, -2> B <0.75, -2.5> B <-1, 2> <-1, 2> Resulting in blue coordinates changing.
  11. 11. v1 System overview - Model updates 1) Receive event Rec. Updater (eg, Purchase) Engine 3) Write user 2a) Write Purchase edge and item 2b) Read other edges coordinates for this user and itemReco. DB Taste graphUser -> coordItem -> coord
  12. 12. v1 System overview - Rec serving 1) Page load Rec. Updater requests Engine recommendations 2) Rec. engine finds other cameras close to user’s3) Recommendations coordinatesshown to user Reco. DB Taste graph User -> coord Item -> coord
  13. 13. v1 Taste Graph data size40 billion edges2 billion item nodes200 million user nodes5TB of data, takes up 10TB with Replication Factor of 2We expect this to quadruple next year as we get more events and add new types of edges 13
  14. 14. v1 Taste Graph DB configuration32 Linux machines 128GB RAM 1TB iSCSI SSD 10 GigE NICCassandra version 1.0.88GB JVM heap spaceSize-tiered compaction strategy
  15. 15. v1 Taste Graph schemaUser Edges (timestamp, edge_type, item_id) … user_id <empty>Item Edges (timestamp, edge_type, user_id) … item_id <empty>User Nodes tastevector user_id 200 bytes (50 floats)Item Nodes tastevector item_id 200 bytes (50 floats)
  16. 16. v1 Real-time taste updatesEdges and nodes read per second
  17. 17. v1 Real-time taste updatesEdges and nodes written per second
  18. 18. Questions?tp@hunch.com 18

×