SlideShare a Scribd company logo
Ask Me Any Rating: A Content-based
Recommender System based on
Recurrent Neural Networks
7th Italian Information Retrieval Workshop
Venezia (Italy), May 30-31 2016
Cataldo Musto, Claudio Greco, Alessandro Suglia and Giovanni Semeraro
Work supported by the IBM Faculty Award ”Deep Learning to boost Cognitive Question Answering”
Titan X GPU used for this research donated by the NVIDIA Corporation
1
Overview
1. Background
Content-based recommender systems
Neural network models
2. Research work
Ask me Any Rating (AMAR)
Experimental evaluation
3. Conclusions
Lesson-learnt
Vision
2
Background
Content-based recommender systems
Consists in matching up the attributes of a user profile with
the attributes of a content object (item) [1]
[1] P. Lops, M. De Gemmis, and G. Semeraro. “Content-based recommender systems:
State of the art and trends”. In: Recommender systems handbook. Springer, 2011
3
Deep learning
Definition
Allows computational models that are composed of
multiple processing layers to learn representations of data
with multiple levels of abstraction [2]
• Discovers intricate structure in large data sets by using the
backpropagation algorithm [3];
• Leads to progressively more abstract features at higher layers of
representations;
• More abstract concepts are generally invariant to most local
changes of the input;
[2] Y. LeCun, Y. Bengio, and G. Hinton. “Deep learning”. In: Nature 521 (2015)
[3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. “Learning representations by
back-propagating errors”. In: Cognitive modeling (1988)
4
Recurrent Neural Networks
• Recurrent Neural Networks (RNN) are architectures suitable to
model variable-length sequential data [4];
• The connections between their units may contain loops which
let them consider past states in the learning process;
• Their roots are in the Dynamical System Theory in which the
following relation is true:
s(t)
= f(s(t−1)
; x(t)
; θ)
where s(t)
represents the current system state computed by a
generic function f evaluated on the previous state s(t−1)
, x(t)
represents the current input and θ are the network parameters.
[4] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations
by error propagation. Tech. rep. DTIC Document, 1985
5
RNN pros and cons
Pros
• Appropriate to represent sequential data;
• A versatile framework which can be applied to different tasks;
• Can learn short-term and long-term temporal dependencies.
Cons
• Vanishing/exploding gradient problem [5];
• Difficulties to reach satisfying minima during the optimization of
the loss function;
• Difficult to parallelize the training process.
[5] Y. Bengio, P. Simard, and P. Frasconi. “Learning long-term dependencies with
gradient descent is difficult”. In: Neural Networks, IEEE Transactions on 5 (1994)
6
Long Short Term Memory (LSTM)
• A specific RNN introduced to solve the vanishing/exploding
gradient problem;
• Each cell presents a complex structure which is more powerful
than simple RNN cells.
Figure: LSTM architecture [6]
forget gate (f) considers the
current input and the previous
state to remove or preserve the
most appropriate information for
the given task
[6] A. Graves, A. Mohamed, and G. Hinton. “Speech recognition with deep recurrent
neural networks”. In: Acoustics, Speech and Signal Processing (ICASSP), IEEE 2013
7
Long Short Term Memory (LSTM)
• A specific RNN introduced to solve the vanishing/exploding
gradient problem;
• Each cell presents a complex structure which is more powerful
than simple RNN cells.
Figure: LSTM architecture [6]
input gate (i) considers the
current input and the previous
state to determine how the input
information will be used to
update the state cell
[6] A. Graves, A. Mohamed, and G. Hinton. “Speech recognition with deep recurrent
neural networks”. In: Acoustics, Speech and Signal Processing (ICASSP), IEEE 2013
7
Long Short Term Memory (LSTM)
• A specific RNN introduced to solve the vanishing/exploding
gradient problem;
• Each cell presents a complex structure which is more powerful
than simple RNN cells.
Figure: LSTM architecture [6]
output gate (o) considers the
current input, the previous state
and the updated state cell to
generate an appropriate output
for the given task
[6] A. Graves, A. Mohamed, and G. Hinton. “Speech recognition with deep recurrent
neural networks”. In: Acoustics, Speech and Signal Processing (ICASSP), IEEE 2013
7
Research work
Ask Me Any Rating (AMAR)
“Mirror, mirror, here I stand.
What is the fairest movie in the
land?”
• Inspired by a neural network model
used to solve Question Answering
toy tasks [7];
• Name adapted from “Ask Me
Anything” [8];
• Very simple Factoid Question
Answering system where user
profiles are questions and ratings
are answers.
[7] J. Weston et al. “Towards AI-Complete Question Answering: A Set of Prerequisite
Toy Tasks”. In: CoRR abs/1502.05698 (2015)
[8] A. Kumar et al. “Ask Me Anything: Dynamic Memory Networks for Natural
Language Processing”. In: CoRR abs/1506.07285 (2015)
8
Ask Me Any Rating (AMAR)
• Two different modules to
generate:
• User embedding
• Item embedding
• User embedding associated
to a user identifier;
• Item embedding generated
from an item description;
• Concatenation of user and
item embeddings given to a
logistic regression layer to
predict the probability of a
“like”.
w1 w2 wm
User u Item description id
User LT Word LT
v(u)
v(id)
LSTM LSTM LSTM
v(w1) v(w2) v(wm)
h(w1) h(w2) h(wm)
Mean pooling layer
Concatenation layer
Logistic regression layer
9
Ask Me Any Rating (AMAR)
User embedding
• An identifier u is associated to each user;
• The identifier is given as input to a lookup table (User LT);
• User LT converts it to a learnt user embedding v(u).
Item embedding
• Each word w1 . . . wm of the item description id is associated to a
unique identifier specific of the item descriptions corpus;
• Words identifiers are given as input to a lookup table (Item LT);
• Item LT converts them to learnt words embeddings v(wk);
• Words embeddings v(wk) are sequentially passed through an
RNN with LSTM cells (LSTM module);
• The LSTM module generates a latent representation h(wk) for
each word;
• A mean pooling layer averages the words representations
generating an item embedding v(id) for the item i.
10
Ask Me Any Rating (AMAR)
“Like” probability estimation
• Item and user embeddings, v(id) and v(u), are concatenated in a
single representation;
• The resulting representation is used as feature for the
prediction task;
• A Logistic regression layer is used to estimate the probability of
a “like” given by user u to a specific item i;
• The generated score is used to build a sorted list of
recommended items for user u.
Optimization criterion
• The neural network is trained minimizing the Binary
Cross-entropy loss function.
11
AMAR extended
• AMAR extended adds to the AMAR
architecture an additional module
for items genres;
• An identifier gk is associated to each
item genre;
• Genres identifier are given as input
to a lookup table (Genre LT);
• Genres LT converts them to learnt
genres embeddings v(gk);
• A mean pooling layer averages the
genres representations generating a
genres embedding v(ig).
g1 g2 gn
Item genres igj
Genre LT
v(u) v(id)
Mean pooling layer
Concatenation layer
Logistic regression layer
v(g1) v(g2) v(gn)
v(ig)
Item description idUser u
12
Experimental protocol
• Datasets: Movielens 1M (ML1M) e DBbook;
• Text preprocessing: tokenization and stopword removal;
• Evaluation strategy: 5-fold cross validation for Movielens 1M,
holdout for DBbook;
• Recommendation task: top-N recommendation leveraging
binary user feedback;
• Evaluation strategy for recommendation: TestRatings [9];
• Metric: F1-measure evaluated at 5, 10 and 15.
[9] A. Bellogin, P. Castells, and I. Cantador. “Precision-oriented evaluation of
recommender systems: an algorithmic comparison”. In: Proceedings of the fifth ACM
conference on Recommender systems. 2011
13
ML1M
A film dataset created by the research group GroupLens of the
University of Minnesota which contains user ratings on a 5-stars
scale.
Each rating has been binarized according to the following formula:
bin_rating(r) =
{
1, if r ≥ 4
0, otherwise
#ratings 1000209
#users 6040
#item 3301
avg ratings per user 31.423
avg positive ratings per user 17.985
avg negative ratings per user 13.439
sparsity 0.95
14
DBbook
A book dataset released for the Linked open data-enabled
recommender systems: ESWC 2014 challenge [10].
It contains binary user preferences (e.g., I like it, I don’t like it).
#ratings 72371
#users 6181
#item 8170
avg ratings per user 11.392
avg positive ratings per user 6.727
avg negative ratings per user 4.665
sparsity 0.998
[10] T. Di Noia, I. Cantador, and V. C. Ostuni. “Linked open data-enabled
recommender systems: ESWC 2014 challenge on book recommendation”.
In: Semantic Web Evaluation Challenge. Springer, 2014
15
Models configurations
Embedding-based recommenders
W2V Google News (W2V-news)
• Method: SG
• Embedding size: 300
• Corpus: Google News
GloVe
• Embedding size: 300
• Corpus: Wikipedia 2014 +
Gigaword 5
Baseline recommenders
Item to item CF (I2I) *
• Neighbours: 30, 50, 80
User to user CF (U2U) *
• Neighbours: 30, 50, 80
SLIM with BPR-Opt (BPRSlim) *
TF-IDF
Bayesian Personalized Ranking
Matrix Factorization (BPRMF) *
• Latent factors: 10, 30, 50
Weighted Matrix Factorization
Method (WRMF) *
• Latent factors: 10, 30, 50
* MyMediaLite implementations 16
Models configurations
AMAR
• Opt. method: RMSprop [11]
• α: 0.9
• Learning rate: 0.001
• Epochs: 25;
• User embedding size: 10;
• Item embedding size: 10;
• LSTM output size: 10;
• Batch size:
• ML1M: 1536
• DBbook: 512
AMAR extended
• Opt. method: RMSprop
• α: 0.9
• Learning rate: 0.001
• Epochs: 25;
• User embedding size: 10;
• Item embedding size: 10;
• Genre embedding size: 10;
• LSTM output size: 10;
• Batch size:
• ML1M: 1536
• DBbook: 512
[11] T. Tieleman and G. E. Hinton. “rmsprop”. In: COURSERA: Neural Networks for
Machine Learning Lecture 6.5 (2012) 17
DBbook results
0.662 0.662
0.655
0.656
0.64
0.639
0.631
0.636
0.632
0.662
0.62
0.62
0.63
0.63
0.64
0.64
0.65
0.65
0.66
0.66
0.67
AMAR AMAR
extended
GloVe W2V-News I2I-30 U2U-30 BPRMF-30 WRMF-50 BPRSlim TF-IDF
F1@10
RECOMMENDER CONFIGURATIONS
Differences statistically significant according to Wilcoxon test (ρ ≤ 0.05)
18
ML1M results
0.641 0.644
0.575
0.587
0.527 0.525 0.524 0.525
0.548
0.59
0.40
0.45
0.50
0.55
0.60
0.65
AMAR AMAR extended GloVe W2V-News I2I-30 U2U-30 BPRMF-30 WRMF-50 BPRSlim TF-IDF
F1@10
RECOMMENDER CONFIGURATIONS
Only differences between U2U and GloVe, BPRSlim and GloVe, GloVe and Word2vec
are not statistically significant according to Wilcoxon test (ρ ≤ 0.05)
19
Conclusions
AMAR pros and cons
Pros
• High improvement on ML1M;
• Able to learn more suitable item and user representations for
the recommendation task;
• Item and user embeddings are not generated using a simple
mean, but they are adapted during training.
Cons
• It does not deal well with very sparse datasets:
• Small improvement on DBbook
• High training times:
• DBbook: 50 minutes per epoch
• ML1M: 90 minutes per epoch
20
AMAR Improvements
Optimization
• Use alternative training methods and regularization techniques;
• Use pretrained word embeddings;
• More appropriate cost functions for top-N recommendation;
• Increase embedding dimensions.
Architecture
• Item modeling may be improved by using different neural
network architectures;
• Classification step may be done by using deeper fully connected
layers.
Additional features
Leverage important data silos to enrich item representations:
• Linked Open Data;
• Web and social media. 21
Thanks for your attention
• Design of recommender systems
using deep neural networks;
• Experimental evaluation on
well-known datasets on the
top-N recommendation task;
• Higher performance using deep
models than using shallow
models.
Alessandro Suglia
alessandro.suglia@gmail.com
Claudio Greco
claudiogaetanogreco@gmail.com
22
Technical details
(Warning: for geeks only)
Cross entropy
Definition
Given two probability distributions over the same underlying set of
events, p and q, it measures the average number of bits needed to
identify an event drawn from a set of possibilities, if a coding
scheme is used based on an “unnatural” probability distribution q,
rather than the “true” distribution p.
Given discrete probability distributions p and q, the cross entropy is
defined as follows:
H(p, q) = −
∑
x
p(x) log q(x)
23
RNN
Given an input vector x(t)
, bias vectors b, c and weight matrices U, V
and W, a forward step of an RNN neural network is computed in this
way:
at = b + Wst−1 + Uxt
st = tanh(at)
ot = c + Vst
pt = softmax(ot)
In this case, the activation function are the hyperbolic tangent
(tanh) for the hidden layer and the multinomial logistic function
(softmax) for the output layer.
24
LSTM
The information flow in an LSTM module is much more complex than
the one in an RNN. The architecture used in this work uses the
following equations, presented in [6]:
it = σ(Wxixt + Whiht−1 + Wcict−1 + bi)
ft = σ(Wxfxt + Whfht−1 + Wcfct−1 + bf)
ct = ftct−1 + it tanh(Wxcxt + Whcht−1 + bc)
ot = σ(Wxoxt + Whoht−1 + Wcoct + bo)
ht = ot tanh(ct)
where σ is the logistic sigmoid function, and i, f, o and c are
respectively the input gate, forget gate,output gate and cell
activation vectors, all of which are the same size as the hidden
vector h.
25
Corpus stats
Google News
• # tokens: 6B
• Vocabulary size: 40K
• # matched words:
• DBbook: 44636 (41.52%)
• ML1M: 35150 (49.13%)
GloVe
• # tokens: 100B
• Vocabulary size: 3M
• # matched words:
• DBbook: 65013 (60.48%)
• ML1M: 49893 (69.74%)
26
References
[1] Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro.
“Content-based recommender systems: State of the art and
trends”. In: Recommender systems handbook. Springer, 2011.
[2] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep
learning”. In: Nature 521 (2015).
[3] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams.
“Learning representations by back-propagating errors”. In:
Cognitive modeling 5 (1988).
[4] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams.
Learning internal representations by error propagation.
Tech. rep. DTIC Document, 1985.
[5] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learning
long-term dependencies with gradient descent is difficult”. In:
Neural Networks, IEEE Transactions on 5 (1994).
[6] Alan Graves, Abdel-rahman Mohamed, and Geoffrey Hinton.
“Speech recognition with deep recurrent neural networks”. In:
26
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE
International Conference on. IEEE. 2013.
[7] Jason Weston et al. “Towards AI-Complete Question Answering:
A Set of Prerequisite Toy Tasks”. In: CoRR abs/1502.05698 (2015).
[8] Ankit Kumar et al. “Ask Me Anything: Dynamic Memory
Networks for Natural Language Processing”. In: CoRR
abs/1506.07285 (2015).
[9] Alejandro Bellogin, Pablo Castells, and Ivan Cantador.
“Precision-oriented evaluation of recommender systems: an
algorithmic comparison”. In: Proceedings of the fifth ACM
conference on Recommender systems. 2011.
[10] Tommaso Di Noia, Iván Cantador, and Vito Claudio Ostuni.
“Linked open data-enabled recommender systems: ESWC 2014
challenge on book recommendation”. In: Semantic Web
Evaluation Challenge. Springer, 2014.
26
[11] Tijmen Tieleman and Geoffrey E. Hinton. “rmsprop”. In:
COURSERA: Neural Networks for Machine Learning Lecture 6.5
(2012).
26

More Related Content

What's hot

Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
Shuai Zhang
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델
guesta34d441
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
Ritesh Sawant
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
hyunsung lee
 
AI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AIAI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AI
MLconf
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal Learning
Marc Bolaños Solà
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
Thi K. Tran-Nguyen, PhD
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
Chun-Ming Chang
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Universitat Politècnica de Catalunya
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
Thomas da Silva Paula
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
David Rostcheck
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
TELKOMNIKA JOURNAL
 

What's hot (20)

Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
 
AI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AIAI for Neuroscience and Neuroscience for AI
AI for Neuroscience and Neuroscience for AI
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal Learning
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 

Similar to Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
VasileiosMezaris
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
Łukasz Grala
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender Systems
Enrico Palumbo
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Olga Scrivner
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
IRJET Journal
 
Licentiate Defense Slide
Licentiate Defense SlideLicentiate Defense Slide
Licentiate Defense Slide
Rerngvit Yanggratoke
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
Siby Jose Plathottam
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Vijay Srinivas Agneeswaran, Ph.D
 
#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing
Berlin Language Technology
 
1st review android malware.pptx
1st review  android malware.pptx1st review  android malware.pptx
1st review android malware.pptx
Nambiraju
 
(Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing (Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing
Gilles Perrouin
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
Yogendra Tamang
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptx
adnansbp
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Jen Stirrup
 
Molecular autoencoder
Molecular autoencoderMolecular autoencoder
Molecular autoencoder
Dan Elton
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 

Similar to Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks (20)

LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender Systems
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
 
Licentiate Defense Slide
Licentiate Defense SlideLicentiate Defense Slide
Licentiate Defense Slide
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing
 
1st review android malware.pptx
1st review  android malware.pptx1st review  android malware.pptx
1st review android malware.pptx
 
(Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing (Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptx
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Molecular autoencoder
Molecular autoencoderMolecular autoencoder
Molecular autoencoder
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 

Recently uploaded

一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 

Recently uploaded (20)

一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 

Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

  • 1. Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks 7th Italian Information Retrieval Workshop Venezia (Italy), May 30-31 2016 Cataldo Musto, Claudio Greco, Alessandro Suglia and Giovanni Semeraro Work supported by the IBM Faculty Award ”Deep Learning to boost Cognitive Question Answering” Titan X GPU used for this research donated by the NVIDIA Corporation 1
  • 2. Overview 1. Background Content-based recommender systems Neural network models 2. Research work Ask me Any Rating (AMAR) Experimental evaluation 3. Conclusions Lesson-learnt Vision 2
  • 4. Content-based recommender systems Consists in matching up the attributes of a user profile with the attributes of a content object (item) [1] [1] P. Lops, M. De Gemmis, and G. Semeraro. “Content-based recommender systems: State of the art and trends”. In: Recommender systems handbook. Springer, 2011 3
  • 5. Deep learning Definition Allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction [2] • Discovers intricate structure in large data sets by using the backpropagation algorithm [3]; • Leads to progressively more abstract features at higher layers of representations; • More abstract concepts are generally invariant to most local changes of the input; [2] Y. LeCun, Y. Bengio, and G. Hinton. “Deep learning”. In: Nature 521 (2015) [3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. “Learning representations by back-propagating errors”. In: Cognitive modeling (1988) 4
  • 6. Recurrent Neural Networks • Recurrent Neural Networks (RNN) are architectures suitable to model variable-length sequential data [4]; • The connections between their units may contain loops which let them consider past states in the learning process; • Their roots are in the Dynamical System Theory in which the following relation is true: s(t) = f(s(t−1) ; x(t) ; θ) where s(t) represents the current system state computed by a generic function f evaluated on the previous state s(t−1) , x(t) represents the current input and θ are the network parameters. [4] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. Tech. rep. DTIC Document, 1985 5
  • 7. RNN pros and cons Pros • Appropriate to represent sequential data; • A versatile framework which can be applied to different tasks; • Can learn short-term and long-term temporal dependencies. Cons • Vanishing/exploding gradient problem [5]; • Difficulties to reach satisfying minima during the optimization of the loss function; • Difficult to parallelize the training process. [5] Y. Bengio, P. Simard, and P. Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: Neural Networks, IEEE Transactions on 5 (1994) 6
  • 8. Long Short Term Memory (LSTM) • A specific RNN introduced to solve the vanishing/exploding gradient problem; • Each cell presents a complex structure which is more powerful than simple RNN cells. Figure: LSTM architecture [6] forget gate (f) considers the current input and the previous state to remove or preserve the most appropriate information for the given task [6] A. Graves, A. Mohamed, and G. Hinton. “Speech recognition with deep recurrent neural networks”. In: Acoustics, Speech and Signal Processing (ICASSP), IEEE 2013 7
  • 9. Long Short Term Memory (LSTM) • A specific RNN introduced to solve the vanishing/exploding gradient problem; • Each cell presents a complex structure which is more powerful than simple RNN cells. Figure: LSTM architecture [6] input gate (i) considers the current input and the previous state to determine how the input information will be used to update the state cell [6] A. Graves, A. Mohamed, and G. Hinton. “Speech recognition with deep recurrent neural networks”. In: Acoustics, Speech and Signal Processing (ICASSP), IEEE 2013 7
  • 10. Long Short Term Memory (LSTM) • A specific RNN introduced to solve the vanishing/exploding gradient problem; • Each cell presents a complex structure which is more powerful than simple RNN cells. Figure: LSTM architecture [6] output gate (o) considers the current input, the previous state and the updated state cell to generate an appropriate output for the given task [6] A. Graves, A. Mohamed, and G. Hinton. “Speech recognition with deep recurrent neural networks”. In: Acoustics, Speech and Signal Processing (ICASSP), IEEE 2013 7
  • 12. Ask Me Any Rating (AMAR) “Mirror, mirror, here I stand. What is the fairest movie in the land?” • Inspired by a neural network model used to solve Question Answering toy tasks [7]; • Name adapted from “Ask Me Anything” [8]; • Very simple Factoid Question Answering system where user profiles are questions and ratings are answers. [7] J. Weston et al. “Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks”. In: CoRR abs/1502.05698 (2015) [8] A. Kumar et al. “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing”. In: CoRR abs/1506.07285 (2015) 8
  • 13. Ask Me Any Rating (AMAR) • Two different modules to generate: • User embedding • Item embedding • User embedding associated to a user identifier; • Item embedding generated from an item description; • Concatenation of user and item embeddings given to a logistic regression layer to predict the probability of a “like”. w1 w2 wm User u Item description id User LT Word LT v(u) v(id) LSTM LSTM LSTM v(w1) v(w2) v(wm) h(w1) h(w2) h(wm) Mean pooling layer Concatenation layer Logistic regression layer 9
  • 14. Ask Me Any Rating (AMAR) User embedding • An identifier u is associated to each user; • The identifier is given as input to a lookup table (User LT); • User LT converts it to a learnt user embedding v(u). Item embedding • Each word w1 . . . wm of the item description id is associated to a unique identifier specific of the item descriptions corpus; • Words identifiers are given as input to a lookup table (Item LT); • Item LT converts them to learnt words embeddings v(wk); • Words embeddings v(wk) are sequentially passed through an RNN with LSTM cells (LSTM module); • The LSTM module generates a latent representation h(wk) for each word; • A mean pooling layer averages the words representations generating an item embedding v(id) for the item i. 10
  • 15. Ask Me Any Rating (AMAR) “Like” probability estimation • Item and user embeddings, v(id) and v(u), are concatenated in a single representation; • The resulting representation is used as feature for the prediction task; • A Logistic regression layer is used to estimate the probability of a “like” given by user u to a specific item i; • The generated score is used to build a sorted list of recommended items for user u. Optimization criterion • The neural network is trained minimizing the Binary Cross-entropy loss function. 11
  • 16. AMAR extended • AMAR extended adds to the AMAR architecture an additional module for items genres; • An identifier gk is associated to each item genre; • Genres identifier are given as input to a lookup table (Genre LT); • Genres LT converts them to learnt genres embeddings v(gk); • A mean pooling layer averages the genres representations generating a genres embedding v(ig). g1 g2 gn Item genres igj Genre LT v(u) v(id) Mean pooling layer Concatenation layer Logistic regression layer v(g1) v(g2) v(gn) v(ig) Item description idUser u 12
  • 17. Experimental protocol • Datasets: Movielens 1M (ML1M) e DBbook; • Text preprocessing: tokenization and stopword removal; • Evaluation strategy: 5-fold cross validation for Movielens 1M, holdout for DBbook; • Recommendation task: top-N recommendation leveraging binary user feedback; • Evaluation strategy for recommendation: TestRatings [9]; • Metric: F1-measure evaluated at 5, 10 and 15. [9] A. Bellogin, P. Castells, and I. Cantador. “Precision-oriented evaluation of recommender systems: an algorithmic comparison”. In: Proceedings of the fifth ACM conference on Recommender systems. 2011 13
  • 18. ML1M A film dataset created by the research group GroupLens of the University of Minnesota which contains user ratings on a 5-stars scale. Each rating has been binarized according to the following formula: bin_rating(r) = { 1, if r ≥ 4 0, otherwise #ratings 1000209 #users 6040 #item 3301 avg ratings per user 31.423 avg positive ratings per user 17.985 avg negative ratings per user 13.439 sparsity 0.95 14
  • 19. DBbook A book dataset released for the Linked open data-enabled recommender systems: ESWC 2014 challenge [10]. It contains binary user preferences (e.g., I like it, I don’t like it). #ratings 72371 #users 6181 #item 8170 avg ratings per user 11.392 avg positive ratings per user 6.727 avg negative ratings per user 4.665 sparsity 0.998 [10] T. Di Noia, I. Cantador, and V. C. Ostuni. “Linked open data-enabled recommender systems: ESWC 2014 challenge on book recommendation”. In: Semantic Web Evaluation Challenge. Springer, 2014 15
  • 20. Models configurations Embedding-based recommenders W2V Google News (W2V-news) • Method: SG • Embedding size: 300 • Corpus: Google News GloVe • Embedding size: 300 • Corpus: Wikipedia 2014 + Gigaword 5 Baseline recommenders Item to item CF (I2I) * • Neighbours: 30, 50, 80 User to user CF (U2U) * • Neighbours: 30, 50, 80 SLIM with BPR-Opt (BPRSlim) * TF-IDF Bayesian Personalized Ranking Matrix Factorization (BPRMF) * • Latent factors: 10, 30, 50 Weighted Matrix Factorization Method (WRMF) * • Latent factors: 10, 30, 50 * MyMediaLite implementations 16
  • 21. Models configurations AMAR • Opt. method: RMSprop [11] • α: 0.9 • Learning rate: 0.001 • Epochs: 25; • User embedding size: 10; • Item embedding size: 10; • LSTM output size: 10; • Batch size: • ML1M: 1536 • DBbook: 512 AMAR extended • Opt. method: RMSprop • α: 0.9 • Learning rate: 0.001 • Epochs: 25; • User embedding size: 10; • Item embedding size: 10; • Genre embedding size: 10; • LSTM output size: 10; • Batch size: • ML1M: 1536 • DBbook: 512 [11] T. Tieleman and G. E. Hinton. “rmsprop”. In: COURSERA: Neural Networks for Machine Learning Lecture 6.5 (2012) 17
  • 22. DBbook results 0.662 0.662 0.655 0.656 0.64 0.639 0.631 0.636 0.632 0.662 0.62 0.62 0.63 0.63 0.64 0.64 0.65 0.65 0.66 0.66 0.67 AMAR AMAR extended GloVe W2V-News I2I-30 U2U-30 BPRMF-30 WRMF-50 BPRSlim TF-IDF F1@10 RECOMMENDER CONFIGURATIONS Differences statistically significant according to Wilcoxon test (ρ ≤ 0.05) 18
  • 23. ML1M results 0.641 0.644 0.575 0.587 0.527 0.525 0.524 0.525 0.548 0.59 0.40 0.45 0.50 0.55 0.60 0.65 AMAR AMAR extended GloVe W2V-News I2I-30 U2U-30 BPRMF-30 WRMF-50 BPRSlim TF-IDF F1@10 RECOMMENDER CONFIGURATIONS Only differences between U2U and GloVe, BPRSlim and GloVe, GloVe and Word2vec are not statistically significant according to Wilcoxon test (ρ ≤ 0.05) 19
  • 25. AMAR pros and cons Pros • High improvement on ML1M; • Able to learn more suitable item and user representations for the recommendation task; • Item and user embeddings are not generated using a simple mean, but they are adapted during training. Cons • It does not deal well with very sparse datasets: • Small improvement on DBbook • High training times: • DBbook: 50 minutes per epoch • ML1M: 90 minutes per epoch 20
  • 26. AMAR Improvements Optimization • Use alternative training methods and regularization techniques; • Use pretrained word embeddings; • More appropriate cost functions for top-N recommendation; • Increase embedding dimensions. Architecture • Item modeling may be improved by using different neural network architectures; • Classification step may be done by using deeper fully connected layers. Additional features Leverage important data silos to enrich item representations: • Linked Open Data; • Web and social media. 21
  • 27. Thanks for your attention • Design of recommender systems using deep neural networks; • Experimental evaluation on well-known datasets on the top-N recommendation task; • Higher performance using deep models than using shallow models. Alessandro Suglia alessandro.suglia@gmail.com Claudio Greco claudiogaetanogreco@gmail.com 22
  • 29. Cross entropy Definition Given two probability distributions over the same underlying set of events, p and q, it measures the average number of bits needed to identify an event drawn from a set of possibilities, if a coding scheme is used based on an “unnatural” probability distribution q, rather than the “true” distribution p. Given discrete probability distributions p and q, the cross entropy is defined as follows: H(p, q) = − ∑ x p(x) log q(x) 23
  • 30. RNN Given an input vector x(t) , bias vectors b, c and weight matrices U, V and W, a forward step of an RNN neural network is computed in this way: at = b + Wst−1 + Uxt st = tanh(at) ot = c + Vst pt = softmax(ot) In this case, the activation function are the hyperbolic tangent (tanh) for the hidden layer and the multinomial logistic function (softmax) for the output layer. 24
  • 31. LSTM The information flow in an LSTM module is much more complex than the one in an RNN. The architecture used in this work uses the following equations, presented in [6]: it = σ(Wxixt + Whiht−1 + Wcict−1 + bi) ft = σ(Wxfxt + Whfht−1 + Wcfct−1 + bf) ct = ftct−1 + it tanh(Wxcxt + Whcht−1 + bc) ot = σ(Wxoxt + Whoht−1 + Wcoct + bo) ht = ot tanh(ct) where σ is the logistic sigmoid function, and i, f, o and c are respectively the input gate, forget gate,output gate and cell activation vectors, all of which are the same size as the hidden vector h. 25
  • 32. Corpus stats Google News • # tokens: 6B • Vocabulary size: 40K • # matched words: • DBbook: 44636 (41.52%) • ML1M: 35150 (49.13%) GloVe • # tokens: 100B • Vocabulary size: 3M • # matched words: • DBbook: 65013 (60.48%) • ML1M: 49893 (69.74%) 26
  • 34. [1] Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. “Content-based recommender systems: State of the art and trends”. In: Recommender systems handbook. Springer, 2011. [2] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep learning”. In: Nature 521 (2015). [3] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. “Learning representations by back-propagating errors”. In: Cognitive modeling 5 (1988). [4] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning internal representations by error propagation. Tech. rep. DTIC Document, 1985. [5] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: Neural Networks, IEEE Transactions on 5 (1994). [6] Alan Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. “Speech recognition with deep recurrent neural networks”. In: 26
  • 35. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE. 2013. [7] Jason Weston et al. “Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks”. In: CoRR abs/1502.05698 (2015). [8] Ankit Kumar et al. “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing”. In: CoRR abs/1506.07285 (2015). [9] Alejandro Bellogin, Pablo Castells, and Ivan Cantador. “Precision-oriented evaluation of recommender systems: an algorithmic comparison”. In: Proceedings of the fifth ACM conference on Recommender systems. 2011. [10] Tommaso Di Noia, Iván Cantador, and Vito Claudio Ostuni. “Linked open data-enabled recommender systems: ESWC 2014 challenge on book recommendation”. In: Semantic Web Evaluation Challenge. Springer, 2014. 26
  • 36. [11] Tijmen Tieleman and Geoffrey E. Hinton. “rmsprop”. In: COURSERA: Neural Networks for Machine Learning Lecture 6.5 (2012). 26