4. Product
Embeddings
for
Recommenda5on
Represent
items
(and
some/mes
users)
as
vectors
in
the
same
space
and
use
their
distances
to
compute
recommenda/ons.
5. • At
a
certain
level,
nothing
new!
• We
already
had
Matrix
Factoriza/on
• It
is
yet
another
way
of
crea/ng
latent
representa/ons
for
Recommenda/on
6. Some
of
the
NN
methods
can
be
translated
back
into
MF
techniques.
Differences:
• new
ways
to
compute
matrix
entries
• new
loss
func/ons
7. Where
do
we
fit?
• Hybrid
model
that
uses
CF
with
content
side-‐informa/on
• Incursion
on
the
embedding
methods
using
side
info
16. Meta-‐Prod2vec:
Embedding
with
Side-‐
Informa5on
Idea:
Use
not
only
the
product
sequence
informa:on,
but
also
product
meta-‐data.
17. Where
is
it
useful?
Product
cold-‐start,
when
sequence
informa:on
is
sparse.
18. How
can
it
help?
We
place
addi:onal
constraints
on
product
co-‐occurrences
using
external
info.
We
can
create
more
noise-‐robust
embeddings
for
products
suffering
from
cold-‐start.
19. Type
of
product
side-‐informa5on:
• Categories
• Brands
• Title
&
Descrip:on
• Tags
22. Let’s
say
we
are
trying
to
build
a
recommender
system
for
songs...
23. We
want
to
build
a
very
simple
solu5on
that
based
on
the
last
song
the
user
heard,
recommends
the
next
song.
24. Two
different
recommenda:on
situa:ons:
• Simple:
the
previous
song
is
popular
• Hard
one:
the
previous
song
is
rela:vely
unknown
(suffers
from
cold
start).
25. Simple
case:
Query
song:
Shake
It
Off
by
Taylor
SwiL.
Best
next
song:
It’s
all
about
the
Bass
by
Meghan
Trainor.
CF
and
Prod2Vec
both
work!
26. Hard
case:
Query
song:
S/ll
by
Taylor
SwiL,
but
is
one
of
her
earlier
songs,
e.g.
You’re
Not
Sorry.
Best
next
song:
?
?
27. Hard
case
+
unlucky:
• Just
one
user
listened
to
You’re
Not
Sorry
• He
also
listened
to
Rammstein’s
Du
Hast!
28. Hard
case
+
unlucky:
Your
Recommenda5on
Is
Not
Working!
30. When
compu:ng
how
plausible
it
is
for
a
user
to
like
a
pair
of
songs,
you
can
place
addi5onal
constraints
by
taking
into
account
the
song
ar5sts.
31. Prod2Vec
constraints
You’re
not
sorry
Du
Hast
P(Du
Hast|Youʹ′re
Not
Sorry)
-‐>
the
next
song
depends
on
the
current
song
32. Prod2Vec
constraints
You’re
not
sorry
Du
Hast
Youʹ′re
Not
Sorry
is
a
fringe
song
-‐>
low
evidence
for
the
posi/ve
and
nega/ve
pairs
33. Ar5st
metadata
constraints
You’re
not
sorry
Du
Hast
Taylor
SwiU
Rammstein
However,
the
associated
singer
is
popular
-‐>
good
evidence
that
Taylor
SwiL
and
Rammstein
do
not
really
co-‐occur
(have
distant
vectors)
34. Ar5st
and
Song
constraints
(1)
You’re
not
sorry
Du
Hast
Taylor
SwiU
Rammstein
Furthermore,
we
can
enforce
that
the
songs
and
their
ar5sts
should
be
close...
35. Ar5st
and
Song
constraints
(2)
You’re
not
sorry
Du
Hast
Taylor
SwiU
Rammstein
Finally,
we
add
two
more
constraints
between
the
ar/sts
and
the
previous/next
song
(they
s/ll
have
more
support
than
the
original
pairs)
36. Meta-‐Prod2Vec
constraints
You’re
not
sorry
Du
Hast
Taylor
SwiU
Rammstein
#1.
P(Rammstein
|
Youʹ′re
Not
Sorry)
the
ar/st
of
the
next
song
should
be
plausible
given
the
current
song
#2.
P(Du
Hast
|
Taylor
SwiW)
the
next
song
should
depend
on
the
current
ar/st
selec/on
#3.
P(Youʹ′re
Not
Sorry
|Taylor
SwiW)
and
P(Du
Hast
|
Rammstein)
the
current
ar/st
selec/on
should
also
influence
the
current
song
selec/on
#4.
P(Rammstein
|
Taylor
SwiW)
the
probability
of
the
next
ar/st
should
be
high
given
the
current
ar/st.
39. MP2V
Implementa5on
• No
changes
in
the
Word2Vec
code!
• Changes
just
in
the
input
pairs:
we
generate
(propor:onally
to
the
importance
hyperparameter)
4
addi:onal
types
of
pairs.
41.
Task
&
Metrics
Task:
Next
Event
Predic:on
Metrics:
• Hit
ra:o
at
K
(HR@K)
• Normalized
Discounted
Cumula:ve
Gain
(NDCG@K)
42.
Methods
BestOf:
(rank
by)
popularity
CoCounts:
cosine
similarity
of
candidate
item
to
query
item
Prod2Vec:
cosine
similarity
of
item
embedding
vectors
Meta-‐Prod2Vec:
cosine
similarity
of
improved
embedding
vectors
Mix(Prod2Vec,
CoCounts):
linear
combina:on
of
the
two
scores
Mix(Meta-‐Prod2Vec,
CoCounts):
same
as
previous
43.
Dataset:
30Music
Dataset
• playlists
data
from
Last.fm
API
• sample
of
100k
user
sessions
• resul:ng
vocabulary
size:
433k
songs
and
67k
ar:sts.
44.
Global
Results
Method
Type
HR@20
NDCG@20
BestOf
Head
0.0003
0.002
CoCounts
Head
0.0160
0.141
Prod2Vec
Tail
0.0101
0.113
MetaProd2Vec
Tail
0.0124
0.125
Mix(Prod2Vec,
CoCounts)
Global
0.0158
0.152
Mix(MetaProd2Vec,
CoCounts)
Global
0.0180
0.161
45.
Results
on
Cold
Start
(HR@20)
Method
Type
Pair
freq
=
0
Pair
freq
<
3
BestOf
Head
0.0002
0.0002
CoCounts
Head
0.0000
0.0197
Prod2Vec
Tail
0.0003
0.0078
MetaProd2Vec
Tail
0.0013
0.0198
Mix(Prod2Vec,
CoCounts)
Global
0.0002
0.0200
Mix(MetaProd2Vec,
CoCounts)
Global
0.0007
0.0291
47. Conclusions
and
Next
Steps
Using
side-‐info
for
product
embeddings
helps,
especially
on
cold-‐start.
48. Conclusions
and
Next
Steps
• Beeer
ways
to
mix
Head
and
Tail
recommenda:on
methods
• Mix
CF
and
Meta-‐Data
at
test
:me
-‐
product
embeddings
using
all
available
signal
(CF,
categorical,
text
and
image
product
informa:on)