Recent Advances in Recommender Systems
using Deep Learning
August 20, 2019
Sungkyunkwan University (SKKU)
Prof. Jongwuk Lee
2
Recommender Systems
Basics
What are Recommender Systems?
3
Items
Recommendation
products, movies, news, …
Search vs. Recommendation
 How can we help users get access to relevant data?
 Pull mode (search engines)
 Users take initiative.
 Ad-hoc information need
 Push mode (recommender systems)
 Systems take initiative.
 Stable information need or a system
has user’s information need.
4
Product Recommendation
5
Amazon product recommendation
Movie & Music Recommendation
6
Netflix movie recommendation
Spotify music recommendation
News Recommendation
7
Naver AiRs
Kakao Rubics
Various Recommendations
8
Social/Friend recommendation Restaurant recommendation
Tag recommendationApp recommendation
What is Collaborative Filtering?
 Given a target user, Alice, find a set of users whose preference
patterns are similar to that of the target user.
 Predict a list of items that Alice will be likely to prefer.
9
Target user: Alice
① Inferring Alice’s
preference
② Finding a set of users with
similar preference for Alice
③ Recommending a list of items
that a user group prefers
Top-N
Recommendation
User-Item Rating Matrix
 A user-item rating matrix R of the target user, Alice, and
other users is given:
 R: a user-item rating matrix (𝒎 × 𝒏 matrix)
 Determine whether She would like or dislike movies, which
has not seen yet.
10
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
…
…
…
…
…
Alice
Latent Factor Models
 How to model user-item interactions?
 U: latent user matrix (𝒎 × 𝒌 matrix)
 Each user is represented by a latent vector (1 × 𝑘 vector).
 V: latent item matrix (𝒏 × 𝒌 matrix)
 Each item is represented by a latent vector (1 × 𝑘 vector).
11
User-item interaction
𝒇 𝑼, 𝑽 = ?
…… ……
Factorizing Two Latent Matrices
 The user-item rating matrix R can be approximated as a
linear combination of two latent matrices U and V.
 R: user-item rating matrix (𝑚 × 𝑛 matrix)
 U: latent user matrix (𝑚 × 𝑘 matrix)
 V: latent item matrix (𝑛 × 𝑘 matrix)
 𝑘: # of latent features
12
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
1.5 ... 0.1
0.6 ... 1.2
0.7 ... 0.5
… ... ...
0.1 ... 0.2
0.2 1.4 1.2 … 2.3
… ... … ... ...
0.1 2.6 0.3 … 1.5
…
…
k features
kfeatures
R U
VT
…
…
…
…
…
Yehuda Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” KDD 2008
Limitation of Existing Models
 Existing models are mainly based on a linear user-item
interaction.
 However, the user-item interaction may be non-linear and
non-trivial.
13
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
1.5 ... 0.1
0.6 ... 1.2
0.7 ... 0.5
… ... ...
0.1 ... 0.2
0.2 1.4 1.2 … 2.3
… ... … ... ...
0.1 2.6 0.3 … 1.5…
…
k features
kfeatures
R U
VT
…
…
…
…
…
14
Deep Recommender
Systems
Statistics of RecSys Models using DNNs
 The number increases exponentially in the last five years.
 SIGIR, WWW, RecSys, KDD, AAAI, WSDM, NIPS, …
15
Shuai Zhang et al., “Deep Learning based Recommender System: A Survey and New Perspectives,” 2017
Categories of RecSys Models using DNNs
16
Deep Learning based
Recommender System
Model using
Single DL
Technique
Integrate
DL with
Traditional
RS
Deep
Composite
Model
Recommend
Rely Solely
on DL
MLP
AE
CNN
RNN
DSSM
RBM
NADE
GAN
Loosely
Coupled
Model
Tightly
Coupled
Model
Integration ModelNeural Network Model
Categories of RecSys Models using DNNs
17
Deep Learning based
Recommender System
Model using
Single DL
Technique
Integrate
DL with
Traditional
RS
Deep
Composite
Model
Recommend
Rely Solely
on DL
MLP
AE
CNN
RNN
DSSM
RBM
NADE
GAN
Loosely
Coupled
Model
Tightly
Coupled
Model
Integration ModelNeural Network Model
AutoRec: Autoencoder-based Model
 For each item, reconstruct rating vectors.
 Observed ratings are only used to update the model.
18
2 1 3…
2 1 3…
…𝒉(𝒓)
𝒓
ො𝒓
𝑾, 𝒃
𝑾′
, 𝒃′
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
Suvash Sedhain et al. “AutoRec: Autoencoders Meet Collaborative Filtering,” WWW 2015
Denoising Autoencoder (DAE)
 Learn to reconstruct a user’s favorite set
෤𝒓 of items from randomly sampled
subsets, i.e., denoising autoencoder.
19
3 2…
3 3 2…
…𝒉(𝒓)
෤𝒓
ො𝒓
𝑾, 𝒃
𝑾′
, 𝒃′
Denoising input
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
Yao Wu et al., “Collaborative Denoising Auto-Encoders for Top-N Recommender Systems,” WSDM 2016
Collaborative Denoising Autoencoder
 Learn to reconstruct a user’s favorite set
෤𝒓 of items from randomly sampled
subsets, i.e., denoising autoencoder.
 Train for all users by shared variables
for items and a specific variable for
each user, i.e., 𝒌 × 𝟏 vector.
20
3 2…
3 3 2…
…𝒉(𝒓)
෤𝒓
ො𝒓
𝑾, 𝒃
𝑾′
, 𝒃′
1 0 0…
𝑽 𝒌×𝟏
One-hot vector for user
Denoising input
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
Generalized Matrix Factorization (GMF)
 Embeddings are used to learn latent user and item features.
 Input: one-hot feature vector for user u and item i
 Output: predicted score ŷui
 Element-wise product is same as the existing MF model.
21
0 0 1 … 00 1 0 … 0
latent user vector Latent item vector
ෝ𝒚 𝒖𝒊
User embedding Item embedding
Element-wise product
Fully connected w/o bias
Layer
… …
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
R
…
…
…
…
…
Xiangnan He et al., “Neural Collaborative Filtering,” WWW 2017
Step 1: Embedding Users and Items
 User embedding
 Represent a latent feature for each user.
 Item embedding
 Similarly, it represents a latent feature for each item.
22
𝟎 𝟏 𝟎 … 𝟎 ×
𝟎. 𝟔 𝟎. 𝟖
𝟎. 𝟔 𝟎. 𝟒
𝟎. 𝟐 𝟎. 𝟓
⋯
𝟎. 𝟔
𝟏. 𝟐
𝟎. 𝟐
⋮ ⋱ ⋮
𝟎. 𝟕 𝟎. 𝟓 ⋯ 𝟎. 𝟑
= 𝟎. 𝟔 𝟎. 𝟒 … 𝟏. 𝟐
𝟏 × 𝒎 vector
𝒎 × 𝒌 matrix
𝟏 × 𝒌 vector
0 0 1 … 00 1 0 … 0
latent user vector Latent item vector
User embedding Item embedding
… …
Step 1: Embedding Users and Items
23
0 0 1 … 00 1 0 … 0
latent user vector Latent item vector
User embedding Item embedding
… …
𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
1.5 ... 0.1
0.6 ... 1.2
0.7 ... 0.5
… ... ...
0.1 ... 0.2
0.2 1.4 1.2 … 2.3
… ... … ... ...
0.1 2.6 0.3 … 1.5
…
…
R
U VT
…
…
…
…
…
…
Step 2: Element-wise Product
 This is same as the linear combination of U and VT.
24
Element-wise product
Layer
𝟎. 𝟕𝟐 … 𝟎. 𝟑𝟔
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
1.5 ... 0.1
0.6 ... 1.2
0.7 ... 0.5
… ... ...
0.1 ... 0.2
0.2 1.4 1.2 … 2.3
… ... … ... ...
0.1 2.6 0.3 … 1.5
…
…
…
…
…
…
…
0 0 1 … 00 1 0 … 0
latent user vector Latent item vector
User embedding Item embedding
… …
𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑
R
U VT
Step 3: Passing Fully Connected Layer
25
ෝ𝒚 𝒖𝒊 Fully connected w/o bias
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
1.5 ... 0.1
0.6 ... 1.2
0.7 ... 0.5
… ... ...
0.1 ... 0.2
0.2 1.4 1.2 … 2.3
… ... … ... ...
0.1 2.6 0.3 … 1.5
…
…
R
…
…
…
…
…
If the weight is equal, it is same as
the existing MF model.
𝑹 = 𝒘(𝑼⨂𝑽 𝑻)
Element-wise product
Layer
𝟎. 𝟕𝟐 … 𝟎. 𝟑𝟔
0 0 1 … 00 1 0 … 0
latent user vector Latent item vector
User embedding Item embedding
… …
𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑
U VT
MLP-based Matrix Factorization
 Instead of element-wise product, concatenate latent user
and item vectors.
26
0 0 1 … 00 1 0 … 0
latent user vector Latent item vector
ෝ𝒚 𝒖𝒊
User embedding Item embedding
Concatenating two latent vectorsLayer 1
Layer X
…….
Fully connected w/o bias
… …
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
R
…
…
…
…
…
MLP-based Matrix Factorization
 Learning non-trivial interactions between users and items
27
0 0 1 … 00 1 0 … 0
latent user vector Latent item vector
ෝ𝒚 𝒖𝒊
User embedding Item embedding
Concatenating two latent vectorsLayer 1
Layer X
…….
Fully connected w/o bias
… …
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
…
R
…
…
…
…
…
𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑
𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑
Fusing Two Solutions
 Neural collaborative filtering: Both use GMF and MLP layers.
28
0 0 1 … 00 1 0 … 0
MF user vector MLP item vectorMLP user vector MF item vector
MLP Layer 1
MLP Layer X
…….
Element-wise product
GMF Layer
ෝ𝒚 𝒖𝒊
Concatenation
Fully connected w/o bias
NeuMF Layer
Concatenation
……
Example: Product Recommendation
29
https://www.bloter.net/archives/288812
Distributional Hypothesis
 “You shall know a word by the company it
keeps.” by J.R. Firth (1957)
 Words with similar contexts share similar meanings.
 One of the most successful ideas of modern NLP
 What is Tejuino ?
 A cup of Tejuino is on the table.
 A woman likes Tejuino.
 Tejuino makes you drunk.
 I usually drink Tejuino every morning.
30
Prod2Vec using Word Embeddings
 How to perform product embedding?
 Adopt the skip-gram model for products.
 Input: i-th product purchased
by the user
 Output: the other product
purchased by the user
31
A set of words  A set of products purchased by the user
Mihajlo Grbovic et al., “E-commerce in Your Inbox: Product Recommendations at Scale,” KDD 2015
Prod2Vec using Word Embeddings
 Imagine that the existing user-item matrix.
 Word  Movie, A set of words  User
 The window size is ignored.
32
3 3 ? 2
? ? 4 1
5 4 ? ?
3 ? ? 3
…
…
…
…
…
… i-th movie
watched by the user
Projection
Softmax
Transform
The other movie
watched by the user
Possible Models of Prod2Vec
33
i-th product
purchased by the user
Projection
Softmax
Transform
The other product
purchased by the user
i-th product purchased by the user
Projection
+ Averaging
Softmax
Transform
user products except for the i-th
product purchased by the user
Prod2Vec Skip-gram model User embedding + Prod2Vec
34
About Time (2013)
드라마/판타지/성장
어떠한 순간을 다시 살게 된다면, 과연 완벽한 사랑을 이룰 수 있을까?
모태솔로 팀은 성인이 된 날, 아비저로부터 놀랄만한 가문의 비밀을
듣게 된다. 바로 사긴을 되돌릴 수 있는 능력이 있다는 것! 여자친구를
만든다는 꿈을 이루기 위해 런던으로 간 팀은 메리에게 첫눈에 반하게
되는데..
Silver Linings Playbook (2012)
드라마/코미디
Secret Life of Walter
Mitty, The (2013)
판타지/드라마
Perks of Being a
Wallflower, The (2012)
드라마/성장
The Theory of Everything(2014)
드라마/로맨스
What If (2013)
로맨스/코미디
Man Up (2015)
드라마/로맨스
Love, Rosie (2014)
로맨스/코미디
Two Night Stand (2014)
로맨스/코미디
GloVe
Skip-gram
Mulan (1998)
애니메이션
동양의 분위기를 살린 디즈니의 역작!
파씨 가문의 외동딸 뮬란은 선머슴 같은 성격 때문에 중매를 볼 때마다
퇴짜를 맞는다. 때마침 훈족의 침입으로 징집명령이 떨어지고 늙은
아버지를 대신하여 남장을 하고 나서는데..
Jungle Book, The (1967)
애니메이션
Antz (1998)
애니메이션
Lady and the Tramp (1955)
애니메이션
Peter Pan (1953)
애니메이션
Thumbelina (1994)
애니메이션
A Dinosaur's Story (1993)
애니메이션
Quest for Camelot (1998)
애니메이션
Return to Never Land (2002)
애니메이션
35
GloVe
Skip-gram
Machine, The (2013)
판타지/SF(로봇)
인간과 로봇 그 경계가 사라진다!
Signal, The (2014)
스릴러/SF (컴퓨터)
Zero Theorem, The (2013)
판타지/드라마(컴퓨터)
Autómata (2014)
스릴러/액션(로봇)
Cargo (2009)
스릴러/미스터리(우주)
신 냉전시대에 인간의 뇌 데이터를 바탕으로 탄생한 살인로봇 머신
에이바는 점차 인간의 감정을 느껴가고, 그녀를 주축으로 머신들은
인간과의 최후의 전쟁을 선포하는데…
the east(2013)
스릴러/액션(스파이)
Signal, The (2014)
스릴러/SF (컴퓨터)
Autómata (2014)
스릴러/액션(로봇)
Cargo (2009)
스릴러/미스터리(우주)
36
GloVe
Skip-gram
Session-based Recommendation
 Sequential behavior (global information)
 Last click (local information)
37
Session input Recommendation
GRU4Rec: RNN-based Model
38
Item2Vec
RNN Layer
Balazs Hidasi et al., “Session-based Recommendations with Recurrent Neural Networks”, RecSys Workshop 2015
NARM: Attention-based Model
 Recommend the next item in a given session.
 Combine global and local information.
 They are represented by RNNs.
39
Jing Li et al., “Neural Attentive Session-based Recommendation,” CIKM 2017
Step 1: Global Encoder in NARM
 Capturing user’s sequential behavior
 ℎ 𝑡 = 1 − 𝑧𝑡 ℎ 𝑡−1 + 𝑧𝑡
෠ℎ 𝑡
 𝑧𝑡 = 𝜎(𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1) where 𝑧𝑡 is update gate
 ෠ℎ 𝑡 = tanh[𝑊𝑥 𝑡 + 𝑈 𝑟𝑡 ⊙ ℎ 𝑡−1 ]
 𝑟𝑡 = 𝜎(𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1)
40
Step 2: Local Encoder in NARM
 Capturing user’s main purpose
 𝛼 𝑡𝑗 = 𝑞 𝒉 𝒕, ℎ𝑗 , where 𝒉 𝒕 is a latent vector for the last item.
 𝐶𝑡
𝑙
= σ 𝑗=1
𝑡
𝛼 𝑡𝑗ℎ𝑗
 𝑞 is an attention scoring function for ℎ 𝑡 and ℎ𝑗.
 𝑞 ℎ 𝑡, ℎ𝑗 = 𝑣 𝑇
𝜎(𝐴1ℎ 𝑡 + 𝐴2ℎ𝑗)
41
Step 3: Decoder in NARM
 Concatenated vector 𝑐𝑡 = 𝑐𝑡
𝑔
; 𝑐𝑡
𝑙
= [ℎ 𝑡
𝑔
; σ 𝑗−1
𝑡
𝛼 𝑡𝑗ℎ 𝑡
𝑙
]
 Use an alternative bi-linear similarity function.
 𝑆𝑖 = 𝑒𝑚𝑏𝑖
𝑇
𝐵𝑐𝑡 where 𝐵 is a 𝐷 × |𝐻| matrix.
 |𝐷| is the dimension of each item.
42
Combining Attention and Memory
 𝒎 𝒔 represents the average vector of items.
 𝑚 𝑠 =
1
𝑡
σ𝑖=1
𝑡
𝑥𝑖
 𝒎 𝒕 is the vector for last item.
43
Qiao Liu et al., “Short-Term Attention/Memory Priority Model for Session-based Recommendation,” KDD 2018
STAMP: Attention/Memory Priority
 𝒎 𝒂 is sum of multiplication between coefficient and
embedding vector.
 𝑚 𝑎 = σ𝑖=1
𝑡
𝛼𝑖 𝑥𝑖
 Attention coefficient: 𝛼𝑖 = 𝑊0 𝜎(𝑊1 𝑥𝑖 + 𝑊2 𝒎 𝒔 + 𝑊3 𝒙 𝒕 + 𝑏)
44
MMCF: Multimodal Collaborative Filtering
for Automatic Playlist Continuation
RecSys Challenge 2018
Team ‘hello world!’ (2nd place), main track
Hojin Yang, Yoonki Jeong, Minjin Choi, and Jongwuk Lee
Sungkyunkwan University, Republic of Korea
Automatic Playlist Continuation
 Million Playlist Dataset (MPD)
46
Playlist title
Tracks in the
playlist
Metadata of tracks (artist,
album)
Challenge Set
47
1 2 3 4 5 6 7 8 9 10
# of tracks 0 1 5 10 5 10 25 100 25 100
Title available Yes Yes Yes Yes No No Yes Yes Yes Yes
Track order Seq Seq Seq Seq Seq Seq Seq Seq Shuffled Shuffled
# of playlists 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000
Few tracks in the
first part
Many tracks in the
first part
Many tracks in the
random position
Challenge Set
48
1 2 3 4 5 6 7 8 9 10
# of tracks 0 1 5 10 5 10 25 100 25 100
Title available Yes Yes Yes Yes No No Yes Yes Yes Yes
Track order Seq Seq Seq Seq Seq Seq Seq Seq Shuffled Shuffled
# of playlists 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000
No tracks in the playlist
How to deal with an edge case?
Challenge Set
49
1 2 3 4 5 6 7 8 9 10
# of tracks 0 1 5 10 5 10 25 100 25 100
Title available Yes Yes Yes Yes No No Yes Yes Yes Yes
Track order Seq Seq Seq Seq Seq Seq Seq Seq Shuffled Shuffled
# of playlists 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000
Scarce information
How to treat playlists with scarce information?
Challenge Set
50
1 2 3 4 5 6 7 8 9 10
# of tracks 0 1 5 10 5 10 25 100 25 100
Title available Yes Yes Yes Yes No No Yes Yes Yes Yes
Track order Seq Seq Seq Seq Seq Seq Seq Seq Shuffled Shuffled
# of playlists 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000
Various types of Input
How to deal with various types of input?
Overview of the Proposed Model
 An ensemble method with two components.
 Autoencoder for tracks and metadata for tracks.
 CharCNN for playlist titles.
51
Overview of the Proposed Model
 An ensemble method with two components.
 Autoencoder for tracks and metadata for tracks.
 CharCNN for playlist titles.
52
Autoencoder-based Model
53
1
0
1
1
0
1Hey Jude
Rehab
Yesterday
Dancing Queen
Mamma Mia
Viva la Vida
encoder decoder
 Learn a latent representation of a given playlist consisting of
a set of tracks
0.9
0.01
0.78
0.9
0. 6
0.8
✔ Top-1
Recommendation
Output
Denoising Autoencoder
54
1
0
1
1
0
1
1
0
1
0
0
0
0.9
0.01
0.78
0.9
0. 6
0.8Hey Jude
Rehab
Yesterday
Dancing Queen
Mamma Mia
Viva la Vida
encoder decoder
1
0
1
1
0
1
 Training with denoising
 Some positive input values are corrupted (set to zero).
denoising
Denoising Autoencoder
55
 Training with denoising
 Some positive input values are corrupted (set to zero).
How to utilize the metadata such
as artists and albums?
1
0
1
1
0
1
1
0
1
0
0
0
0.9
0.01
0.78
0.9
0. 6
0.8Hey Jude
Rehab
Yesterday
Dancing Queen
Mamma Mia
Viva la Vida
encoder decoder
1
0
1
1
0
1
denoising
Utilizing Metadata
56
1
0
1
1
0
1
0
1
1
0
0.9
0.01
0.78
0.9
0. 6
0.8
0.2
0.98
0.9
0.6
Hey Jude
Rehab
Yesterday
Dancing Queen
Mamma Mia
Viva la Vida
encoder decoder
 Concatenate an artist vector corresponding to the track
vector in the playlist.
 Randomly choose either the playlist or its artists as input.
57
Training Strategy: Hide-and-Seek
Training Strategy: Hide-and-Seek
 Randomly choose either the playlist or its artists as input.
58
Training Strategy: Hide-and-Seek
 Randomly choose either the playlist or its artists as input.
59
CharCNN for Playlist Titles
 An ensemble method with two components.
 Autoencoder for tracks and metadata for tracks.
 CharCNN for playlist titles.
60
Word-level CNN for NLP
 Effective for capturing spatial locality of a sequence of texts
61
I like this
song very
much
0.1 0.3 0.2 0.6
0.2 0.6 -1.2 -0.2
-2.1 0.2 0.1 0.4
-2.1 0.9 -3.1 1.4
0.1 0.3 -0.2 0.1
0.4 0.1 0.7 0.1
I
like
this
song
very
Filter (3 by k )
2.2
2.3
-1.3
0.9
max
pooling
Conv layer
2.3
Feature
much
convolution
k-dimension embedding
Word-level CNN for NLP
 Effective for capturing spatial locality of a sequence of texts
62
I like this
song very
much
0.1 0.3 0.2 0.6
0.2 0.6 -1.2 -0.2
-2.1 0.2 0.1 0.4
-2.1 0.9 -3.1 1.4
0.1 0.3 -0.2 0.1
0.4 0.1 0.7 0.1
I
like
this
song
very
Filters (3 by k )
convolution
2.2
2.3
-1.3
0.9
max
pooling
2.3
Feature
much
k-dimension embedding
Conv layer
2.2
2.3
-1.3
0.9
Conv layers
2.2
2.3
-1.3
0.9
1.2
2.4
-1.1
0.4
max
pooling
2.3
1.2
2.4
Feature vector
convolution
CharCNN for Playlist Titles
 Playlist titles are represented by a short text, implying
an abstract description of a playlist.
 Use character-level embedding.
63
Conv layers
Feature
vector
Combining Two Models
 Simplest method: 𝑤𝒊𝒕𝒆𝒎 = 0.5 and 𝑤𝒕𝒊𝒕𝒍𝒆 = 0.5
64
Combining Two Models
 The accuracy of the AE highly relies on the number of tracks
within a playlist.
 Dynamic: Set weights according to the number of items.
65
Items
Playlist Title
Chill songs
0.7
0.4
0.9
0.1
0.2
0.1
0.2
0.3
0.7
0.1
0.6
0.4
0.7
0.2
0.2
AE
CNN
𝑤_𝑖𝑡𝑒𝑚 = 5
𝑤_𝑡𝑖𝑡𝑙𝑒 = 1
RecSys Challenge 2018
66
Recent Progress in Our Lab
 “Dual Neural Personalized Ranking,” WWW 2019
 “Characterization and Early Detection of Evergreen News
Articles,” ECML/PKDD 2019 (To appear)
 “Collaborative Distillation for Top-N Recommendation,”
ICDM 2019 (To appear)
67
Q&A
 https://jongwuklee.weebly.com/
68
SR-GNN using Graph Neural Nets
 Each session graph is proceeded one by one. Node vectors
can be obtained through a gated graph neural network.
 Each session is represented as the combination of the global
preference and current interests of this session using an
attention net.
69
Shu Wu et al., “Session-based Recommendation with Graph Neural Networks,” AAAI 2019
RecSys Challenge 2018
70

Recent advances in deep recommender systems

  • 1.
    Recent Advances inRecommender Systems using Deep Learning August 20, 2019 Sungkyunkwan University (SKKU) Prof. Jongwuk Lee
  • 2.
  • 3.
    What are RecommenderSystems? 3 Items Recommendation products, movies, news, …
  • 4.
    Search vs. Recommendation How can we help users get access to relevant data?  Pull mode (search engines)  Users take initiative.  Ad-hoc information need  Push mode (recommender systems)  Systems take initiative.  Stable information need or a system has user’s information need. 4
  • 5.
  • 6.
    Movie & MusicRecommendation 6 Netflix movie recommendation Spotify music recommendation
  • 7.
  • 8.
    Various Recommendations 8 Social/Friend recommendationRestaurant recommendation Tag recommendationApp recommendation
  • 9.
    What is CollaborativeFiltering?  Given a target user, Alice, find a set of users whose preference patterns are similar to that of the target user.  Predict a list of items that Alice will be likely to prefer. 9 Target user: Alice ① Inferring Alice’s preference ② Finding a set of users with similar preference for Alice ③ Recommending a list of items that a user group prefers Top-N Recommendation
  • 10.
    User-Item Rating Matrix A user-item rating matrix R of the target user, Alice, and other users is given:  R: a user-item rating matrix (𝒎 × 𝒏 matrix)  Determine whether She would like or dislike movies, which has not seen yet. 10 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … … … … … … Alice
  • 11.
    Latent Factor Models How to model user-item interactions?  U: latent user matrix (𝒎 × 𝒌 matrix)  Each user is represented by a latent vector (1 × 𝑘 vector).  V: latent item matrix (𝒏 × 𝒌 matrix)  Each item is represented by a latent vector (1 × 𝑘 vector). 11 User-item interaction 𝒇 𝑼, 𝑽 = ? …… ……
  • 12.
    Factorizing Two LatentMatrices  The user-item rating matrix R can be approximated as a linear combination of two latent matrices U and V.  R: user-item rating matrix (𝑚 × 𝑛 matrix)  U: latent user matrix (𝑚 × 𝑘 matrix)  V: latent item matrix (𝑛 × 𝑘 matrix)  𝑘: # of latent features 12 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … 1.5 ... 0.1 0.6 ... 1.2 0.7 ... 0.5 … ... ... 0.1 ... 0.2 0.2 1.4 1.2 … 2.3 … ... … ... ... 0.1 2.6 0.3 … 1.5 … … k features kfeatures R U VT … … … … … Yehuda Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” KDD 2008
  • 13.
    Limitation of ExistingModels  Existing models are mainly based on a linear user-item interaction.  However, the user-item interaction may be non-linear and non-trivial. 13 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … 1.5 ... 0.1 0.6 ... 1.2 0.7 ... 0.5 … ... ... 0.1 ... 0.2 0.2 1.4 1.2 … 2.3 … ... … ... ... 0.1 2.6 0.3 … 1.5… … k features kfeatures R U VT … … … … …
  • 14.
  • 15.
    Statistics of RecSysModels using DNNs  The number increases exponentially in the last five years.  SIGIR, WWW, RecSys, KDD, AAAI, WSDM, NIPS, … 15 Shuai Zhang et al., “Deep Learning based Recommender System: A Survey and New Perspectives,” 2017
  • 16.
    Categories of RecSysModels using DNNs 16 Deep Learning based Recommender System Model using Single DL Technique Integrate DL with Traditional RS Deep Composite Model Recommend Rely Solely on DL MLP AE CNN RNN DSSM RBM NADE GAN Loosely Coupled Model Tightly Coupled Model Integration ModelNeural Network Model
  • 17.
    Categories of RecSysModels using DNNs 17 Deep Learning based Recommender System Model using Single DL Technique Integrate DL with Traditional RS Deep Composite Model Recommend Rely Solely on DL MLP AE CNN RNN DSSM RBM NADE GAN Loosely Coupled Model Tightly Coupled Model Integration ModelNeural Network Model
  • 18.
    AutoRec: Autoencoder-based Model For each item, reconstruct rating vectors.  Observed ratings are only used to update the model. 18 2 1 3… 2 1 3… …𝒉(𝒓) 𝒓 ො𝒓 𝑾, 𝒃 𝑾′ , 𝒃′ 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … Suvash Sedhain et al. “AutoRec: Autoencoders Meet Collaborative Filtering,” WWW 2015
  • 19.
    Denoising Autoencoder (DAE) Learn to reconstruct a user’s favorite set ෤𝒓 of items from randomly sampled subsets, i.e., denoising autoencoder. 19 3 2… 3 3 2… …𝒉(𝒓) ෤𝒓 ො𝒓 𝑾, 𝒃 𝑾′ , 𝒃′ Denoising input 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … Yao Wu et al., “Collaborative Denoising Auto-Encoders for Top-N Recommender Systems,” WSDM 2016
  • 20.
    Collaborative Denoising Autoencoder Learn to reconstruct a user’s favorite set ෤𝒓 of items from randomly sampled subsets, i.e., denoising autoencoder.  Train for all users by shared variables for items and a specific variable for each user, i.e., 𝒌 × 𝟏 vector. 20 3 2… 3 3 2… …𝒉(𝒓) ෤𝒓 ො𝒓 𝑾, 𝒃 𝑾′ , 𝒃′ 1 0 0… 𝑽 𝒌×𝟏 One-hot vector for user Denoising input 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … …
  • 21.
    Generalized Matrix Factorization(GMF)  Embeddings are used to learn latent user and item features.  Input: one-hot feature vector for user u and item i  Output: predicted score ŷui  Element-wise product is same as the existing MF model. 21 0 0 1 … 00 1 0 … 0 latent user vector Latent item vector ෝ𝒚 𝒖𝒊 User embedding Item embedding Element-wise product Fully connected w/o bias Layer … … 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … R … … … … … Xiangnan He et al., “Neural Collaborative Filtering,” WWW 2017
  • 22.
    Step 1: EmbeddingUsers and Items  User embedding  Represent a latent feature for each user.  Item embedding  Similarly, it represents a latent feature for each item. 22 𝟎 𝟏 𝟎 … 𝟎 × 𝟎. 𝟔 𝟎. 𝟖 𝟎. 𝟔 𝟎. 𝟒 𝟎. 𝟐 𝟎. 𝟓 ⋯ 𝟎. 𝟔 𝟏. 𝟐 𝟎. 𝟐 ⋮ ⋱ ⋮ 𝟎. 𝟕 𝟎. 𝟓 ⋯ 𝟎. 𝟑 = 𝟎. 𝟔 𝟎. 𝟒 … 𝟏. 𝟐 𝟏 × 𝒎 vector 𝒎 × 𝒌 matrix 𝟏 × 𝒌 vector 0 0 1 … 00 1 0 … 0 latent user vector Latent item vector User embedding Item embedding … …
  • 23.
    Step 1: EmbeddingUsers and Items 23 0 0 1 … 00 1 0 … 0 latent user vector Latent item vector User embedding Item embedding … … 𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … 1.5 ... 0.1 0.6 ... 1.2 0.7 ... 0.5 … ... ... 0.1 ... 0.2 0.2 1.4 1.2 … 2.3 … ... … ... ... 0.1 2.6 0.3 … 1.5 … … R U VT … … … … … …
  • 24.
    Step 2: Element-wiseProduct  This is same as the linear combination of U and VT. 24 Element-wise product Layer 𝟎. 𝟕𝟐 … 𝟎. 𝟑𝟔 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … 1.5 ... 0.1 0.6 ... 1.2 0.7 ... 0.5 … ... ... 0.1 ... 0.2 0.2 1.4 1.2 … 2.3 … ... … ... ... 0.1 2.6 0.3 … 1.5 … … … … … … … 0 0 1 … 00 1 0 … 0 latent user vector Latent item vector User embedding Item embedding … … 𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑 R U VT
  • 25.
    Step 3: PassingFully Connected Layer 25 ෝ𝒚 𝒖𝒊 Fully connected w/o bias 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … 1.5 ... 0.1 0.6 ... 1.2 0.7 ... 0.5 … ... ... 0.1 ... 0.2 0.2 1.4 1.2 … 2.3 … ... … ... ... 0.1 2.6 0.3 … 1.5 … … R … … … … … If the weight is equal, it is same as the existing MF model. 𝑹 = 𝒘(𝑼⨂𝑽 𝑻) Element-wise product Layer 𝟎. 𝟕𝟐 … 𝟎. 𝟑𝟔 0 0 1 … 00 1 0 … 0 latent user vector Latent item vector User embedding Item embedding … … 𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑 U VT
  • 26.
    MLP-based Matrix Factorization Instead of element-wise product, concatenate latent user and item vectors. 26 0 0 1 … 00 1 0 … 0 latent user vector Latent item vector ෝ𝒚 𝒖𝒊 User embedding Item embedding Concatenating two latent vectorsLayer 1 Layer X ……. Fully connected w/o bias … … 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … R … … … … …
  • 27.
    MLP-based Matrix Factorization Learning non-trivial interactions between users and items 27 0 0 1 … 00 1 0 … 0 latent user vector Latent item vector ෝ𝒚 𝒖𝒊 User embedding Item embedding Concatenating two latent vectorsLayer 1 Layer X ……. Fully connected w/o bias … … 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … R … … … … … 𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑 𝟎. 𝟔 … 𝟏. 𝟐 𝟏. 𝟐 … 𝟎. 𝟑
  • 28.
    Fusing Two Solutions Neural collaborative filtering: Both use GMF and MLP layers. 28 0 0 1 … 00 1 0 … 0 MF user vector MLP item vectorMLP user vector MF item vector MLP Layer 1 MLP Layer X ……. Element-wise product GMF Layer ෝ𝒚 𝒖𝒊 Concatenation Fully connected w/o bias NeuMF Layer Concatenation ……
  • 29.
  • 30.
    Distributional Hypothesis  “Youshall know a word by the company it keeps.” by J.R. Firth (1957)  Words with similar contexts share similar meanings.  One of the most successful ideas of modern NLP  What is Tejuino ?  A cup of Tejuino is on the table.  A woman likes Tejuino.  Tejuino makes you drunk.  I usually drink Tejuino every morning. 30
  • 31.
    Prod2Vec using WordEmbeddings  How to perform product embedding?  Adopt the skip-gram model for products.  Input: i-th product purchased by the user  Output: the other product purchased by the user 31 A set of words  A set of products purchased by the user Mihajlo Grbovic et al., “E-commerce in Your Inbox: Product Recommendations at Scale,” KDD 2015
  • 32.
    Prod2Vec using WordEmbeddings  Imagine that the existing user-item matrix.  Word  Movie, A set of words  User  The window size is ignored. 32 3 3 ? 2 ? ? 4 1 5 4 ? ? 3 ? ? 3 … … … … … … i-th movie watched by the user Projection Softmax Transform The other movie watched by the user
  • 33.
    Possible Models ofProd2Vec 33 i-th product purchased by the user Projection Softmax Transform The other product purchased by the user i-th product purchased by the user Projection + Averaging Softmax Transform user products except for the i-th product purchased by the user Prod2Vec Skip-gram model User embedding + Prod2Vec
  • 34.
    34 About Time (2013) 드라마/판타지/성장 어떠한순간을 다시 살게 된다면, 과연 완벽한 사랑을 이룰 수 있을까? 모태솔로 팀은 성인이 된 날, 아비저로부터 놀랄만한 가문의 비밀을 듣게 된다. 바로 사긴을 되돌릴 수 있는 능력이 있다는 것! 여자친구를 만든다는 꿈을 이루기 위해 런던으로 간 팀은 메리에게 첫눈에 반하게 되는데.. Silver Linings Playbook (2012) 드라마/코미디 Secret Life of Walter Mitty, The (2013) 판타지/드라마 Perks of Being a Wallflower, The (2012) 드라마/성장 The Theory of Everything(2014) 드라마/로맨스 What If (2013) 로맨스/코미디 Man Up (2015) 드라마/로맨스 Love, Rosie (2014) 로맨스/코미디 Two Night Stand (2014) 로맨스/코미디 GloVe Skip-gram
  • 35.
    Mulan (1998) 애니메이션 동양의 분위기를살린 디즈니의 역작! 파씨 가문의 외동딸 뮬란은 선머슴 같은 성격 때문에 중매를 볼 때마다 퇴짜를 맞는다. 때마침 훈족의 침입으로 징집명령이 떨어지고 늙은 아버지를 대신하여 남장을 하고 나서는데.. Jungle Book, The (1967) 애니메이션 Antz (1998) 애니메이션 Lady and the Tramp (1955) 애니메이션 Peter Pan (1953) 애니메이션 Thumbelina (1994) 애니메이션 A Dinosaur's Story (1993) 애니메이션 Quest for Camelot (1998) 애니메이션 Return to Never Land (2002) 애니메이션 35 GloVe Skip-gram
  • 36.
    Machine, The (2013) 판타지/SF(로봇) 인간과로봇 그 경계가 사라진다! Signal, The (2014) 스릴러/SF (컴퓨터) Zero Theorem, The (2013) 판타지/드라마(컴퓨터) Autómata (2014) 스릴러/액션(로봇) Cargo (2009) 스릴러/미스터리(우주) 신 냉전시대에 인간의 뇌 데이터를 바탕으로 탄생한 살인로봇 머신 에이바는 점차 인간의 감정을 느껴가고, 그녀를 주축으로 머신들은 인간과의 최후의 전쟁을 선포하는데… the east(2013) 스릴러/액션(스파이) Signal, The (2014) 스릴러/SF (컴퓨터) Autómata (2014) 스릴러/액션(로봇) Cargo (2009) 스릴러/미스터리(우주) 36 GloVe Skip-gram
  • 37.
    Session-based Recommendation  Sequentialbehavior (global information)  Last click (local information) 37 Session input Recommendation
  • 38.
    GRU4Rec: RNN-based Model 38 Item2Vec RNNLayer Balazs Hidasi et al., “Session-based Recommendations with Recurrent Neural Networks”, RecSys Workshop 2015
  • 39.
    NARM: Attention-based Model Recommend the next item in a given session.  Combine global and local information.  They are represented by RNNs. 39 Jing Li et al., “Neural Attentive Session-based Recommendation,” CIKM 2017
  • 40.
    Step 1: GlobalEncoder in NARM  Capturing user’s sequential behavior  ℎ 𝑡 = 1 − 𝑧𝑡 ℎ 𝑡−1 + 𝑧𝑡 ෠ℎ 𝑡  𝑧𝑡 = 𝜎(𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1) where 𝑧𝑡 is update gate  ෠ℎ 𝑡 = tanh[𝑊𝑥 𝑡 + 𝑈 𝑟𝑡 ⊙ ℎ 𝑡−1 ]  𝑟𝑡 = 𝜎(𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1) 40
  • 41.
    Step 2: LocalEncoder in NARM  Capturing user’s main purpose  𝛼 𝑡𝑗 = 𝑞 𝒉 𝒕, ℎ𝑗 , where 𝒉 𝒕 is a latent vector for the last item.  𝐶𝑡 𝑙 = σ 𝑗=1 𝑡 𝛼 𝑡𝑗ℎ𝑗  𝑞 is an attention scoring function for ℎ 𝑡 and ℎ𝑗.  𝑞 ℎ 𝑡, ℎ𝑗 = 𝑣 𝑇 𝜎(𝐴1ℎ 𝑡 + 𝐴2ℎ𝑗) 41
  • 42.
    Step 3: Decoderin NARM  Concatenated vector 𝑐𝑡 = 𝑐𝑡 𝑔 ; 𝑐𝑡 𝑙 = [ℎ 𝑡 𝑔 ; σ 𝑗−1 𝑡 𝛼 𝑡𝑗ℎ 𝑡 𝑙 ]  Use an alternative bi-linear similarity function.  𝑆𝑖 = 𝑒𝑚𝑏𝑖 𝑇 𝐵𝑐𝑡 where 𝐵 is a 𝐷 × |𝐻| matrix.  |𝐷| is the dimension of each item. 42
  • 43.
    Combining Attention andMemory  𝒎 𝒔 represents the average vector of items.  𝑚 𝑠 = 1 𝑡 σ𝑖=1 𝑡 𝑥𝑖  𝒎 𝒕 is the vector for last item. 43 Qiao Liu et al., “Short-Term Attention/Memory Priority Model for Session-based Recommendation,” KDD 2018
  • 44.
    STAMP: Attention/Memory Priority 𝒎 𝒂 is sum of multiplication between coefficient and embedding vector.  𝑚 𝑎 = σ𝑖=1 𝑡 𝛼𝑖 𝑥𝑖  Attention coefficient: 𝛼𝑖 = 𝑊0 𝜎(𝑊1 𝑥𝑖 + 𝑊2 𝒎 𝒔 + 𝑊3 𝒙 𝒕 + 𝑏) 44
  • 45.
    MMCF: Multimodal CollaborativeFiltering for Automatic Playlist Continuation RecSys Challenge 2018 Team ‘hello world!’ (2nd place), main track Hojin Yang, Yoonki Jeong, Minjin Choi, and Jongwuk Lee Sungkyunkwan University, Republic of Korea
  • 46.
    Automatic Playlist Continuation Million Playlist Dataset (MPD) 46 Playlist title Tracks in the playlist Metadata of tracks (artist, album)
  • 47.
    Challenge Set 47 1 23 4 5 6 7 8 9 10 # of tracks 0 1 5 10 5 10 25 100 25 100 Title available Yes Yes Yes Yes No No Yes Yes Yes Yes Track order Seq Seq Seq Seq Seq Seq Seq Seq Shuffled Shuffled # of playlists 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 Few tracks in the first part Many tracks in the first part Many tracks in the random position
  • 48.
    Challenge Set 48 1 23 4 5 6 7 8 9 10 # of tracks 0 1 5 10 5 10 25 100 25 100 Title available Yes Yes Yes Yes No No Yes Yes Yes Yes Track order Seq Seq Seq Seq Seq Seq Seq Seq Shuffled Shuffled # of playlists 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 No tracks in the playlist How to deal with an edge case?
  • 49.
    Challenge Set 49 1 23 4 5 6 7 8 9 10 # of tracks 0 1 5 10 5 10 25 100 25 100 Title available Yes Yes Yes Yes No No Yes Yes Yes Yes Track order Seq Seq Seq Seq Seq Seq Seq Seq Shuffled Shuffled # of playlists 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 Scarce information How to treat playlists with scarce information?
  • 50.
    Challenge Set 50 1 23 4 5 6 7 8 9 10 # of tracks 0 1 5 10 5 10 25 100 25 100 Title available Yes Yes Yes Yes No No Yes Yes Yes Yes Track order Seq Seq Seq Seq Seq Seq Seq Seq Shuffled Shuffled # of playlists 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 Various types of Input How to deal with various types of input?
  • 51.
    Overview of theProposed Model  An ensemble method with two components.  Autoencoder for tracks and metadata for tracks.  CharCNN for playlist titles. 51
  • 52.
    Overview of theProposed Model  An ensemble method with two components.  Autoencoder for tracks and metadata for tracks.  CharCNN for playlist titles. 52
  • 53.
    Autoencoder-based Model 53 1 0 1 1 0 1Hey Jude Rehab Yesterday DancingQueen Mamma Mia Viva la Vida encoder decoder  Learn a latent representation of a given playlist consisting of a set of tracks 0.9 0.01 0.78 0.9 0. 6 0.8 ✔ Top-1 Recommendation Output
  • 54.
    Denoising Autoencoder 54 1 0 1 1 0 1 1 0 1 0 0 0 0.9 0.01 0.78 0.9 0. 6 0.8HeyJude Rehab Yesterday Dancing Queen Mamma Mia Viva la Vida encoder decoder 1 0 1 1 0 1  Training with denoising  Some positive input values are corrupted (set to zero). denoising
  • 55.
    Denoising Autoencoder 55  Trainingwith denoising  Some positive input values are corrupted (set to zero). How to utilize the metadata such as artists and albums? 1 0 1 1 0 1 1 0 1 0 0 0 0.9 0.01 0.78 0.9 0. 6 0.8Hey Jude Rehab Yesterday Dancing Queen Mamma Mia Viva la Vida encoder decoder 1 0 1 1 0 1 denoising
  • 56.
    Utilizing Metadata 56 1 0 1 1 0 1 0 1 1 0 0.9 0.01 0.78 0.9 0. 6 0.8 0.2 0.98 0.9 0.6 HeyJude Rehab Yesterday Dancing Queen Mamma Mia Viva la Vida encoder decoder  Concatenate an artist vector corresponding to the track vector in the playlist.
  • 57.
     Randomly chooseeither the playlist or its artists as input. 57 Training Strategy: Hide-and-Seek
  • 58.
    Training Strategy: Hide-and-Seek Randomly choose either the playlist or its artists as input. 58
  • 59.
    Training Strategy: Hide-and-Seek Randomly choose either the playlist or its artists as input. 59
  • 60.
    CharCNN for PlaylistTitles  An ensemble method with two components.  Autoencoder for tracks and metadata for tracks.  CharCNN for playlist titles. 60
  • 61.
    Word-level CNN forNLP  Effective for capturing spatial locality of a sequence of texts 61 I like this song very much 0.1 0.3 0.2 0.6 0.2 0.6 -1.2 -0.2 -2.1 0.2 0.1 0.4 -2.1 0.9 -3.1 1.4 0.1 0.3 -0.2 0.1 0.4 0.1 0.7 0.1 I like this song very Filter (3 by k ) 2.2 2.3 -1.3 0.9 max pooling Conv layer 2.3 Feature much convolution k-dimension embedding
  • 62.
    Word-level CNN forNLP  Effective for capturing spatial locality of a sequence of texts 62 I like this song very much 0.1 0.3 0.2 0.6 0.2 0.6 -1.2 -0.2 -2.1 0.2 0.1 0.4 -2.1 0.9 -3.1 1.4 0.1 0.3 -0.2 0.1 0.4 0.1 0.7 0.1 I like this song very Filters (3 by k ) convolution 2.2 2.3 -1.3 0.9 max pooling 2.3 Feature much k-dimension embedding Conv layer 2.2 2.3 -1.3 0.9 Conv layers 2.2 2.3 -1.3 0.9 1.2 2.4 -1.1 0.4 max pooling 2.3 1.2 2.4 Feature vector convolution
  • 63.
    CharCNN for PlaylistTitles  Playlist titles are represented by a short text, implying an abstract description of a playlist.  Use character-level embedding. 63 Conv layers Feature vector
  • 64.
    Combining Two Models Simplest method: 𝑤𝒊𝒕𝒆𝒎 = 0.5 and 𝑤𝒕𝒊𝒕𝒍𝒆 = 0.5 64
  • 65.
    Combining Two Models The accuracy of the AE highly relies on the number of tracks within a playlist.  Dynamic: Set weights according to the number of items. 65 Items Playlist Title Chill songs 0.7 0.4 0.9 0.1 0.2 0.1 0.2 0.3 0.7 0.1 0.6 0.4 0.7 0.2 0.2 AE CNN 𝑤_𝑖𝑡𝑒𝑚 = 5 𝑤_𝑡𝑖𝑡𝑙𝑒 = 1
  • 66.
  • 67.
    Recent Progress inOur Lab  “Dual Neural Personalized Ranking,” WWW 2019  “Characterization and Early Detection of Evergreen News Articles,” ECML/PKDD 2019 (To appear)  “Collaborative Distillation for Top-N Recommendation,” ICDM 2019 (To appear) 67
  • 68.
  • 69.
    SR-GNN using GraphNeural Nets  Each session graph is proceeded one by one. Node vectors can be obtained through a gated graph neural network.  Each session is represented as the combination of the global preference and current interests of this session using an attention net. 69 Shu Wu et al., “Session-based Recommendation with Graph Neural Networks,” AAAI 2019
  • 70.