Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Meta-Prod2Vec: Simple Product Embeddings with Side-Information


Published on

Talk given by Flavian Vasile, Criteo, during the RecsysFR meetup on October 6th 2016.

Published in: Internet

Meta-Prod2Vec: Simple Product Embeddings with Side-Information

  1. 1.     Meta-­‐Prod2Vec:  Simple  Product   Embeddings  with  Side-­‐Informa:on               Flavian  Vasile,  Elena  Smirnova  @Criteo   Alexis  Conneau  @FAIR  
  2. 2. Contents   •  Product  Embeddings  for  Recommenda:on     •  Embedding  CF  signal:  Word2Vec  and  Prod2Vec   •  Meta-­‐Prod2vec:  Embedding  with  Side-­‐Informa:on     •  Experimental  Results   •  Conclusions  
  3. 3. Product  Embeddings  for   Recommenda5on      
  4. 4. Product  Embeddings  for   Recommenda5on       Represent  items  (and  some/mes  users)  as   vectors  in  the  same  space  and  use  their   distances  to  compute  recommenda/ons.  
  5. 5. •  At  a  certain  level,  nothing  new!   •  We  already  had  Matrix  Factoriza/on   •  It  is  yet  another  way  of  crea/ng  latent   representa/ons  for  Recommenda/on  
  6. 6. Some  of  the  NN  methods  can  be   translated  back  into  MF  techniques.       Differences:   •  new  ways  to  compute  matrix  entries     •  new  loss  func/ons  
  7. 7. Where  do  we  fit?   •  Hybrid  model  that  uses  CF  with  content   side-­‐informa/on   •  Incursion  on  the  embedding  methods   using  side  info  
  8. 8. Embedding  CF  signal:     Word2Vec  and  Prod2Vec    
  9. 9. (Word-­‐to-­‐hidden  matrix)  x  (Hidden-­‐to-­‐Word  context  matrix)       Word2Vec:  Skip-­‐gram  
  10. 10.   Word2Vec   In  this  space,  words  that  appear  in  similar  contexts   will  tend  to  be  close:    
  11. 11.   The  same  idea  can  be  applied  to  other  sequen:al   data,  such  as  user  shopping  sessions  -­‐  Prod2vec.    
  12. 12. Words  =  products   Sentences  =  shopping  sessions   Grbovic  et  al.  E-­‐commerce  in  Your  Inbox:  Product  RecommendaBons  at  Scale,  WWW  2013     Prod2Vec  
  13. 13. Prod2Vec   The  resul:ng  embedding  will  co-­‐locate  products   that  appear  in  the  vicinity  of  the  same  products.    
  14. 14. Prod2Vec  loss  func5on        
  15. 15. Meta-­‐Prod2vec:  Embedding  with  Side-­‐ Informa5on      
  16. 16. Meta-­‐Prod2vec:  Embedding  with  Side-­‐ Informa5on       Idea:  Use  not  only  the  product  sequence   informa:on,  but  also  product  meta-­‐data.    
  17. 17. Where  is  it  useful?   Product  cold-­‐start,  when  sequence   informa:on  is  sparse.  
  18. 18. How  can  it  help?   We  place  addi:onal  constraints  on   product  co-­‐occurrences  using  external   info.     We  can  create  more  noise-­‐robust   embeddings  for  products  suffering  from   cold-­‐start.  
  19. 19. Type  of  product  side-­‐informa5on:     •  Categories   •  Brands   •  Title  &  Descrip:on   •  Tags  
  20. 20. How  does  Meta-­‐Prod2Vec  leverage  this   informa5on  for  cold-­‐start?  
  21. 21. Mo5va5ng  example:  
  22. 22. Let’s  say  we  are  trying  to  build  a   recommender  system  for  songs...    
  23. 23. We  want  to  build  a  very  simple  solu5on   that  based  on  the  last  song  the  user   heard,  recommends  the  next  song.    
  24. 24. Two  different  recommenda:on  situa:ons:     •  Simple:  the  previous  song  is  popular   •  Hard  one:  the  previous  song  is   rela:vely  unknown  (suffers  from  cold   start).    
  25. 25. Simple  case:               Query  song:  Shake  It  Off  by  Taylor  SwiL.     Best  next  song:  It’s  all  about  the  Bass  by  Meghan  Trainor.       CF  and  Prod2Vec  both  work!  
  26. 26. Hard  case:               Query  song:  S/ll  by  Taylor  SwiL,  but  is  one  of  her  earlier  songs,   e.g.  You’re  Not  Sorry.     Best  next  song:  ?      ?  
  27. 27. Hard  case  +  unlucky:     •  Just  one  user  listened  to  You’re  Not  Sorry   •  He  also  listened  to  Rammstein’s  Du  Hast!  
  28. 28. Hard  case  +  unlucky:               Your  Recommenda5on  Is  Not  Working!  
  29. 29. This  is  where  Meta-­‐Prod2Vec  comes  in   handy!  
  30. 30. When  compu:ng  how  plausible  it  is  for  a   user  to  like  a  pair  of  songs,  you  can  place   addi5onal  constraints  by  taking  into   account  the  song  ar5sts.    
  31. 31. Prod2Vec  constraints                   You’re  not  sorry   Du  Hast   P(Du  Hast|Youʹ′re  Not  Sorry)  -­‐>  the  next  song  depends  on  the  current   song  
  32. 32. Prod2Vec  constraints                   You’re  not  sorry   Du  Hast   Youʹ′re  Not  Sorry  is  a  fringe  song  -­‐>  low  evidence  for  the  posi/ve  and   nega/ve  pairs    
  33. 33. Ar5st  metadata  constraints                   You’re  not  sorry   Du  Hast   Taylor  SwiU   Rammstein   However,  the  associated  singer  is  popular  -­‐>  good  evidence  that  Taylor   SwiL  and  Rammstein  do  not  really  co-­‐occur  (have  distant  vectors)    
  34. 34. Ar5st  and  Song  constraints  (1)                   You’re  not  sorry   Du  Hast   Taylor  SwiU   Rammstein   Furthermore,  we  can  enforce  that  the  songs  and  their  ar5sts  should  be   close...    
  35. 35. Ar5st  and  Song  constraints  (2)               You’re  not  sorry   Du  Hast   Taylor  SwiU   Rammstein   Finally,  we  add  two  more  constraints  between  the  ar/sts  and  the   previous/next  song  (they  s/ll  have  more  support  than  the  original  pairs)  
  36. 36. Meta-­‐Prod2Vec  constraints                   You’re  not  sorry   Du  Hast   Taylor  SwiU   Rammstein   #1.  P(Rammstein  |  Youʹ′re  Not  Sorry)   the  ar/st  of  the  next  song  should  be  plausible     given  the  current  song     #2.  P(Du  Hast  |  Taylor  SwiW)   the  next  song  should  depend  on  the     current  ar/st  selec/on     #3.  P(Youʹ′re  Not  Sorry  |Taylor  SwiW)     and  P(Du  Hast  |  Rammstein)     the  current  ar/st  selec/on  should  also  influence     the  current  song  selec/on     #4.  P(Rammstein  |  Taylor  SwiW)   the  probability  of  the  next  ar/st  should     be  high  given  the  current  ar/st.    
  37. 37. PuXng  it  all  together:     Meta-­‐Prod2Vec  loss        
  38. 38.   Rela5onship  with  MF  with  Side-­‐Info:    
  39. 39. MP2V  Implementa5on     •  No  changes  in  the  Word2Vec  code!   •  Changes  just  in  the  input  pairs:  we  generate   (propor:onally  to  the  importance   hyperparameter)  4  addi:onal  types  of  pairs.  
  40. 40. Experimental  Results    
  41. 41.   Task  &  Metrics     Task:  Next  Event  Predic:on     Metrics:   •  Hit  ra:o  at  K  (HR@K)     •  Normalized  Discounted  Cumula:ve  Gain   (NDCG@K)    
  42. 42.   Methods     BestOf:  (rank  by)  popularity   CoCounts:  cosine  similarity  of  candidate  item  to  query  item     Prod2Vec:  cosine  similarity  of  item  embedding  vectors   Meta-­‐Prod2Vec:  cosine  similarity  of  improved  embedding   vectors   Mix(Prod2Vec,  CoCounts):  linear  combina:on  of  the  two  scores   Mix(Meta-­‐Prod2Vec,  CoCounts):  same  as  previous  
  43. 43.   Dataset:  30Music  Dataset     •  playlists  data  from  API   •  sample  of  100k  user  sessions     •  resul:ng  vocabulary  size:  433k  songs   and  67k  ar:sts.      
  44. 44.   Global  Results     Method   Type   HR@20   NDCG@20   BestOf   Head   0.0003   0.002   CoCounts   Head   0.0160   0.141   Prod2Vec   Tail   0.0101   0.113   MetaProd2Vec   Tail   0.0124   0.125   Mix(Prod2Vec,  CoCounts)   Global   0.0158   0.152   Mix(MetaProd2Vec,  CoCounts)   Global   0.0180   0.161  
  45. 45.   Results  on  Cold  Start  (HR@20)       Method   Type   Pair  freq  =  0   Pair  freq  <  3   BestOf   Head   0.0002   0.0002   CoCounts   Head   0.0000   0.0197   Prod2Vec   Tail   0.0003   0.0078   MetaProd2Vec   Tail   0.0013   0.0198   Mix(Prod2Vec,  CoCounts)   Global   0.0002   0.0200   Mix(MetaProd2Vec,  CoCounts)   Global   0.0007   0.0291  
  46. 46. Conclusions  and     Next  Steps  
  47. 47. Conclusions  and     Next  Steps     Using  side-­‐info  for  product  embeddings   helps,  especially  on  cold-­‐start.    
  48. 48. Conclusions  and     Next  Steps   •  Beeer  ways  to  mix  Head  and  Tail   recommenda:on  methods   •  Mix  CF  and  Meta-­‐Data  at  test  :me    -­‐  product   embeddings  using  all  available  signal  (CF,   categorical,  text  and  image  product   informa:on)  
  49. 49.   Thanks!    
  50. 50.   Ques5ons?