Modeling taste with Cassandra




Affinity is based on user tastes, preferences, and interests

                                                               1
What is a taste profile?

               Operational definition: the set of things you like and dislike

Stuff I like                                   Stuff I don’t like




     Challenge: how do you build a set of things you like and dislike
           Operational definition: the taste profile for someone?               2
Thesis: Likes are correlated
Inferring correlations
        D               1)   User A:
                              •   Democrat
                              •   Likes Arugula
                        2)   User B:
                    C
                              •   Republican
    E
            ?                 •   Dislikes Arugula
                        3)   User C indicates:
                              •   Democrat

                        What would we infer is User C’s affinity for
                        Arugula?

A
                        Answer: User C would like Arugula
                B




                                                                       4
Inferring correlations

               Like arugula
                                      User A


                                      <3, 2.5>

                     <1,1>
Dislike                           Like
Obama                             Obama
          User B


      <-2,-1.5>


    <-3,-3>
              Dislike arugula

                              User C           If someone’s affinity
                                               for Obama is 2.0,
                              <2,?>
                                               what is their affinity
                                               for arugula?

                                                                        5
Discovering latent factors
                                                            Obama
                                                                             Liberal
                                                     Arugula        <5, 5>
                                Like arugula
                                                                <4, 4>
                                                   User A

                                                     <3, 2>

                                      <1,1>
                 Dislike                            Like
                 Obama                              Obama
                           User B


                        <-2,-1.5>

            Iceberg
                       <-3,-3>
               <-4, -4>        Dislike arugula
    GOP


 <-5, -5>                                      User C         Predict 1.5 for how
                                                              much this person will
                                               <2,1.5>
Conservative                                                  like arugula.


                                                                                       6
Taste space = many latent factors

                                      <0.7, 4.4, -.1>
                        Liberal

                                  <0.5, 2.4, -.4>
                                     A
                                     Extroverted


Masculine                                          Feminine



                        <-0.5, -3.1, 0.1>
        Introverted
                       B
                      Conservative




                                                              7
What is a taste profile profile?

                 Operational definition: a coordinate in taste space

Stuff I like (close to me in taste space)   Stuff I don’t like (far away in taste space)




           Operational definition: the set of things you like and dislike
        Challenge: how do you calculate taste coordinates?                            8
Calculating taste coordinates
                       D                     Edge weight = dot product of nodes
? <x, y>
                                             to constrain similar items to be
                   2            <1, -1>
                                             close to each other.
                                     C       Assume edge weights of:
               E                                +2 = “love”
                                                -2 = “hate”
      2    <1, -0.5>
                                             Democratic node must solve:
                                               1*x -2*y = 2 (edge from A)
           2
                           -2                  1*x -1*y = 2 (edge from C)
      A
                                             Solution = <2, 0>
 <1, -2>                         B
                           <-1, 2>




                                                                             9
Updating taste coordinates

          User A purchases a camera...

<1, -1>
                                                          <1, -0.5>
                          2         <1, -1>
                                                                                      2         <1, -1>
                                         C
                                                                                                     C
                                              <-1, 0.5>
                                                                                                          <-1, 0.5>
                  <1, -0.5>
                                                                      2       <1, -0.5>

              2
                               -2                                             2
                                              2                                            -2
          A                                                                                               2
                                                                      A
   <1, -2>                           B
                                                               <0.75, -2.5>                      B
                              <-1, 2>
                                                                                          <-1, 2>

          Resulting in blue coordinates changing.
v1 System overview - Model updates

                                     1) Receive event
  Rec.              Updater          (eg, Purchase)
  Engine



    3) Write user             2a) Write Purchase edge
    and item                  2b) Read other edges
    coordinates               for this user and item




Reco. DB            Taste graph
User -> coord
Item -> coord
v1 System overview - Rec serving

                     1) Page load        Rec.          Updater
                     requests            Engine
                     recommendations


                               2) Rec. engine
                               finds other
                               cameras close
                               to user’s
3) Recommendations             coordinates
shown to user


                                       Reco. DB        Taste graph
                                       User -> coord
                                       Item -> coord
v1 Taste Graph data size


40 billion edges
2 billion item nodes
200 million user nodes

5TB of data, takes up 10TB with Replication Factor of 2

We expect this to quadruple next year as we get more events and add
  new types of edges




                                                                      13
v1 Taste Graph DB configuration


32 Linux machines
  128GB RAM
  1TB iSCSI SSD
  10 GigE NIC


Cassandra version 1.0.8

8GB JVM heap space

Size-tiered compaction strategy
v1 Taste Graph schema

User Edges
              (timestamp, edge_type, item_id)   …
   user_id               <empty>
Item Edges
              (timestamp, edge_type, user_id)   …
    item_id              <empty>
User Nodes
                   tastevector
    user_id   200 bytes (50 floats)
Item Nodes
                   tastevector
    item_id   200 bytes (50 floats)
v1 Real-time taste updates

Edges and nodes read per second
v1 Real-time taste updates

Edges and nodes written per second
Questions?


tp@hunch.com




                        18

Graph Based Recommendation Systems at eBay

  • 1.
    Modeling taste withCassandra Affinity is based on user tastes, preferences, and interests 1
  • 2.
    What is ataste profile? Operational definition: the set of things you like and dislike Stuff I like Stuff I don’t like Challenge: how do you build a set of things you like and dislike Operational definition: the taste profile for someone? 2
  • 3.
  • 4.
    Inferring correlations D 1) User A: • Democrat • Likes Arugula 2) User B: C • Republican E ? • Dislikes Arugula 3) User C indicates: • Democrat What would we infer is User C’s affinity for Arugula? A Answer: User C would like Arugula B 4
  • 5.
    Inferring correlations Like arugula User A <3, 2.5> <1,1> Dislike Like Obama Obama User B <-2,-1.5> <-3,-3> Dislike arugula User C If someone’s affinity for Obama is 2.0, <2,?> what is their affinity for arugula? 5
  • 6.
    Discovering latent factors Obama Liberal Arugula <5, 5> Like arugula <4, 4> User A <3, 2> <1,1> Dislike Like Obama Obama User B <-2,-1.5> Iceberg <-3,-3> <-4, -4> Dislike arugula GOP <-5, -5> User C Predict 1.5 for how much this person will <2,1.5> Conservative like arugula. 6
  • 7.
    Taste space =many latent factors <0.7, 4.4, -.1> Liberal <0.5, 2.4, -.4> A Extroverted Masculine Feminine <-0.5, -3.1, 0.1> Introverted B Conservative 7
  • 8.
    What is ataste profile profile? Operational definition: a coordinate in taste space Stuff I like (close to me in taste space) Stuff I don’t like (far away in taste space) Operational definition: the set of things you like and dislike Challenge: how do you calculate taste coordinates? 8
  • 9.
    Calculating taste coordinates D Edge weight = dot product of nodes ? <x, y> to constrain similar items to be 2 <1, -1> close to each other. C Assume edge weights of: E +2 = “love” -2 = “hate” 2 <1, -0.5> Democratic node must solve: 1*x -2*y = 2 (edge from A) 2 -2 1*x -1*y = 2 (edge from C) A Solution = <2, 0> <1, -2> B <-1, 2> 9
  • 10.
    Updating taste coordinates User A purchases a camera... <1, -1> <1, -0.5> 2 <1, -1> 2 <1, -1> C C <-1, 0.5> <-1, 0.5> <1, -0.5> 2 <1, -0.5> 2 -2 2 2 -2 A 2 A <1, -2> B <0.75, -2.5> B <-1, 2> <-1, 2> Resulting in blue coordinates changing.
  • 11.
    v1 System overview- Model updates 1) Receive event Rec. Updater (eg, Purchase) Engine 3) Write user 2a) Write Purchase edge and item 2b) Read other edges coordinates for this user and item Reco. DB Taste graph User -> coord Item -> coord
  • 12.
    v1 System overview- Rec serving 1) Page load Rec. Updater requests Engine recommendations 2) Rec. engine finds other cameras close to user’s 3) Recommendations coordinates shown to user Reco. DB Taste graph User -> coord Item -> coord
  • 13.
    v1 Taste Graphdata size 40 billion edges 2 billion item nodes 200 million user nodes 5TB of data, takes up 10TB with Replication Factor of 2 We expect this to quadruple next year as we get more events and add new types of edges 13
  • 14.
    v1 Taste GraphDB configuration 32 Linux machines 128GB RAM 1TB iSCSI SSD 10 GigE NIC Cassandra version 1.0.8 8GB JVM heap space Size-tiered compaction strategy
  • 15.
    v1 Taste Graphschema User Edges (timestamp, edge_type, item_id) … user_id <empty> Item Edges (timestamp, edge_type, user_id) … item_id <empty> User Nodes tastevector user_id 200 bytes (50 floats) Item Nodes tastevector item_id 200 bytes (50 floats)
  • 16.
    v1 Real-time tasteupdates Edges and nodes read per second
  • 17.
    v1 Real-time tasteupdates Edges and nodes written per second
  • 18.