SlideShare a Scribd company logo
Iterative Multi-document Neural Attention
for Multiple Answer Prediction
URANIA Workshop
Genova (Italy), November, 28th, 2016
Claudio Greco, Alessandro Suglia, Pierpaolo Basile, Gaetano Rossiello and
Giovanni Semeraro
Work supported by the IBM Faculty Award ”Deep Learning to boost Cognitive Question Answering”
Titan X GPU used for this research donated by the NVIDIA Corporation
1
Overview
1. Motivation
2. Methodology
3. Experimental evaluation
4. Conclusions and Future Work
5. Appendix
2
Motivation
Motivation
• People have information needs of varying complexity such as:
• simple questions about common facts (Question Answering)
• suggest movie to watch for a romantic evening (Recommendation)
• An intelligent agent able to answer questions formulated in a
proper way can solve them, eventually considering:
• user context
• user preferences
Idea
In a scenario in which the user profile can be represented by a
question, intelligent agents able to answer questions can be used
to find the most appealing items for a given user
3
Motivation
Conversational Recommender Systems (CRS)
Assist online users in their information-seeking and decision
making tasks by supporting an interactive process [1] which could
be goal oriented with the task of starting general and, through a
series of interaction cycles, narrowing down the user interests until
the desired item is obtained [2].
[1]: T. Mahmood and F. Ricci. “Improving recommender systems with adaptive
conversational strategies”. In: Proceedings of the 20th ACM conference on Hypertext
and hypermedia. ACM. 2009.
[2]: N. Rubens et al. “Active learning in recommender systems”. In: Recommender
Systems Handbook. Springer, 2015.
4
Methodology
Building blocks for a CRS
According to our vision, to implement a CRS we should design the
following building blocks:
1. Question Answering + recommendation
2. Answer explanation
3. Dialog manager
Our work called “Iterative Multi-document Neural Attention for
Multiple Answer Prediction” tries to tackle building block 1.
5
Iterative Multi-document Neural Attention
for Multi Answer Prediction
The key contributions of this work are the following:
1. We extend the model reported in [3] to let the inference process
exploit evidences observed in multiple documents
2. We design a model able to leverage the attention weights
generated by the inference process to provide multiple answers
3. We assess the efficacy of our model through an experimental
evaluation on the Movie Dialog [4] dataset
[3]: A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating Neural Attention for
Machine Reading”. In: arXiv preprint arXiv:1606.02245 (2016)
[4]: J. Dodge et al. “Evaluating prerequisite qualities for learning end-to-end dialog
systems”. In: arXiv preprint arXiv:1511.06931 (2015).
6
Iterative Multi-document Neural Attention
for Multi Answer Prediction
Given a query q, ψ : Q → D produces the set of documents relevant
for q, where Q is the set of all queries and D is the set of all
documents.
Our model defines a workflow in which a sequence of inference
steps are performed:
1. Encoding phase
2. Inference phase
• Query attentive read
• Document attentive read
• Gating search results
3. Prediction phase
7
Encoding phase
Both queries and documents are represented by a sequence of
words X = (x1, x2, . . . , x|X|), drawn from a vocabulary V. Each word is
represented by a continuous d-dimensional word embedding x ∈ Rd
stored in a word embedding matrix X ∈ R|V|×d
.
Documents and query are encoded using a bidirectional recurrent
neural network with Gated Recurrent Units (GRU) as in [3].
Differently from [3], we build a unique representation for the whole
set of documents related to the query by stacking each document
token representations given by the bidirectional GRU.
[3]: A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating Neural Attention for
Machine Reading”. In: arXiv preprint arXiv:1606.02245 (2016)
8
Inference phase
This phase uncovers a possible inference chain which models
meaningful relationships between the query and the set of related
documents. The inference chain is obtained by performing, for each
timestep t = 1, 2, . . . , T, the attention mechanisms given by the query
attentive read and the document attentive read.
• query attentive read: performs an attention mechanism over the
query at inference step t conditioned by the inference state
• document attentive read: performs an attention mechanism
over the documents at inference step t conditioned by the
refined query representation and the inference state
• gating search results: updates the inference state in order to
retain useful information for the inference process about query
and documents and forget useless one
9
Inference phase
[3]: A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating Neural Attention for
Machine Reading”. In: arXiv preprint arXiv:1606.02245 (2016)
10
Prediction phase
• Leverages document attention weights computed at the
inference step t to generate a relevance score for each
candidate answer
• Relevance scores for each token coming from the l different
documents Dq related to the query q are accumulated
score(w) =
1
π(w)
l∑
i=1
ϕ(i, w)
where:
• ϕ(i, w) returns the score associated to the word w in document i
• π(w) returns the frequency of the word w in Dq
11
Prediction phase
• A 2-layer feed-forward neural network is used to learn latent
relationships between tokens in documents
• The output layer of the neural network generates a score for
each candidate answer using a sigmoid activation function
z = [score(w1), score(w2), . . . , score(w|V|)]
y = sigmoid(Who relu((Wihz + bih)) + bho)
where:
• u is the hidden layer size
• Wih ∈ R|V|×u
, Woh ∈ R|A|×u
are weight matrices
• bih ∈ Ru
, bho ∈ R|A|
are bias vectors
• sigmoid(x) = 1
1+e−x is the sigmoid function
• relu(x) = max(0, x) is the ReLU activation function
12
Experimental evaluation
Movie Dialog
bAbI Movie Dialog [4] dataset, composed by different tasks such as:
• factoid QA (QA)
• top-n recommendation (Recs)
• QA+recommendation in a dialog fashion
• Turns of dialogs taken from Reddit
[4]: J. Dodge et al. “Evaluating prerequisite qualities for learning end-to-end dialog
systems”. In: arXiv preprint arXiv:1511.06931 (2015).
13
Experimental evaluation
• Differently from [4], the relevant knowledge base facts,
represented in triple from, are retrieved by ψ implemented
using Elasticsearch engine
• Evaluation metrics:
• QA task: HITS@1
• Recs task: HITS@100
• The optimization method and tricks are adopted from [3]
• The model is implemented in TensorFlow [5] and executed on an
NVIDIA TITAN X GPU
[3]: A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating Neural Attention for
Machine Reading”. In: arXiv preprint arXiv:1606.02245 (2016)
[4]: J. Dodge et al. “Evaluating prerequisite qualities for learning end-to-end dialog
systems”. In: arXiv preprint arXiv:1511.06931 (2015).
[5]: M. Abadi et al. “TensorFlow: Large-Scale Machine Learning on Heterogeneous
Distributed Systems”. In: CoRR abs/1603.04467 (2016).
14
Experimental evaluation
METHODS QA TASK RECS TASK
QA SYSTEM 90.7 N/A
SVD N/A 19.2
IR N/A N/A
LSTM 6.5 27.1
SUPERVISED EMBEDDINGS 50.9 29.2
MEMN2N 79.3 28.6
JOINT SUPERVISED EMBEDDINGS 43.6 28.1
JOINT MEMN2N 83.5 26.5
OURS 86.8 30
Table 1: Comparison between our model and baselines from [4] on the QA
and Recs tasks evaluated according to HITS@1 and HITS@100, respectively.
[4]: J. Dodge et al. “Evaluating prerequisite qualities for learning end-to-end dialog
systems”. In: arXiv preprint arXiv:1511.06931 (2015).
15
Inference phase attention weights
Question:
what does Larenz Tate act in ?
Ground truth answers:
The Postman, A Man Apart, Dead Presidents, Love Jones, Why Do Fools Fall in Love, The Inkwell
Most relevant sentences:
• The Inkwell starred actors Joe Morton , Larenz Tate , Suzzanne Douglas , Glynn Turman
• Love Jones starred actors Nia Long , Larenz Tate , Isaiah Washington , Lisa Nicole Carson
• Why Do Fools Fall in Love starred actors Halle Berry , Vivica A. Fox , Larenz Tate , Lela Rochon
• The Postman starred actors Kevin Costner , Olivia Williams , Will Patton , Larenz Tate
• Dead Presidents starred actors Keith David , Chris Tucker , Larenz Tate
• A Man Apart starred actors Vin Diesel , Larenz Tate
Figure 1: Attention weights computed by the neural network attention
mechanisms at the last inference step T for each token. Higher shades
correspond to higher relevance scores for the related tokens.
16
Conclusions and Future Work
Pros and Cons
Pros
• Huge gap between our model and all the other baselines
• Fully general model able to extract relevant information from a
generic document collection
• Learns latent relationships between document tokens thanks to
the feed-forward neural network in the prediction phase
• Provides multiple answers for a given question
Cons
• Still not satisfying performance on the Recs task
• Issues in the Recs task dataset according to [6]
[6]: R. Searle and M. Bingham-Walker. “Why “Blow Out”? A Structural Analysis of the
Movie Dialog Dataset”. In: ACL 2016 (2016)
17
Future Work
• Design a ψ operator able to return relevant facts recognizing the
most relevant information in the query
• Exploit user preferences and contextual information to learn the
user model
• Provide a mechanism which leverages attention weights to give
explanations [7]
• Collect dialog data with user information and feedback
• Design of a framework for dialog management based on
Reinforcement Learning [8]
[7]: B. Goodman and S. Flaxman. “European Union regulations on algorithmic
decision-making and a ”right to explanation””. In: arXiv preprint arXiv:1606.08813
(2016).
[8]: R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. Vol. 1. 1.
MIT press Cambridge, 1998
18
Appendix
Recurrent Neural Networks
• Recurrent Neural Networks (RNN) are architectures suitable to
model variable-length sequential data [9];
• The connections between their units may contain loops which
let them consider past states in the learning process;
• Their roots are in the Dynamical System Theory in which the
following relation is true:
s(t)
= f(s(t−1)
; x(t)
; θ)
where s(t)
represents the current system state computed by a
generic function f evaluated on the previous state s(t−1)
, x(t)
represents the current input and θ are the network parameters.
[9] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations
by error propagation. Tech. rep. DTIC Document, 1985
19
RNN pros and cons
Pros
• Appropriate to represent sequential data;
• A versatile framework which can be applied to different tasks;
• Can learn short-term and long-term temporal dependencies.
Cons
• Vanishing/exploding gradient problem [10, 11];
• Difficulties to reach satisfying minima during the optimization of
the loss function;
• Difficult to parallelize the training process.
[10] Y. Bengio, P. Simard, and P. Frasconi. “Learning long-term dependencies with
gradient descent is difficult”. In: Neural Networks, IEEE Transactions on 5 (1994)
[11] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in
recurrent nets: the difficulty of learning long-term dependencies. 2001.
20
Gated Recurrent Unit
Gated Recurrent Unit (GRU) [12] is a special kind of RNN cell which
tries to solve the vanishing/exploding gradient problem.
GRU description taken from https://goo.gl/gJe8jZ.
[12] K. Cho et al. “Learning phrase representations using RNN encoder-decoder for
statistical machine translation”. In: arXiv preprint arXiv:1406.1078 (2014).
21
Attention mechanism
• Mechanism inspired by the way the human brain is able to focus
on relevant aspects of a dynamic scene and supported by
studies in visual cognition [13];
• Neural networks equipped with an attention mechanism are
able to learn relevant parts of an input representation for a
specific task;
• Attention mechanisms in Deep Learning techniques has
incredibly boosted performance in a lot of different tasks such
as Computer Vision [14–16], Question Answering [17, 18] and
Machine Translation [19].
22
References
[1] T. Mahmood and F. Ricci. “Improving recommender systems
with adaptive conversational strategies”. In: Proceedings of the
20th ACM conference on Hypertext and hypermedia. ACM. 2009,
pp. 73–82.
[2] N. Rubens, M. Elahi, M. Sugiyama, and D. Kaplan. “Active
learning in recommender systems”. In: Recommender Systems
Handbook. Springer, 2015, pp. 809–846.
[3] A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating
Neural Attention for Machine Reading”. In: arXiv preprint
arXiv:1606.02245 (2016).
[4] J. Dodge et al. “Evaluating prerequisite qualities for learning
end-to-end dialog systems”. In: arXiv preprint arXiv:1511.06931
(2015).
[5] Martı́n Abadi et al. “TensorFlow: Large-Scale Machine Learning
on Heterogeneous Distributed Systems”. In: CoRR
abs/1603.04467 (2016). url:
http://arxiv.org/abs/1603.04467.
22
[6] R. Searle and M. Bingham-Walker. “Why “Blow Out”? A
Structural Analysis of the Movie Dialog Dataset”. In: ACL 2016
(2016), p. 215.
[7] Bryce Goodman and Seth Flaxman. “European Union
regulations on algorithmic decision-making and a” right to
explanation””. In: arXiv preprint arXiv:1606.08813 (2016).
[8] Richard S Sutton and Andrew G Barto. Reinforcement learning:
An introduction. Vol. 1. 1. MIT press Cambridge, 1998.
[9] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams.
Learning internal representations by error propagation.
Tech. rep. DTIC Document, 1985.
[10] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learning
long-term dependencies with gradient descent is difficult”. In:
IEEE transactions on neural networks 5.2 (1994), pp. 157–166.
22
[11] Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and
Jürgen Schmidhuber. Gradient flow in recurrent nets: the
difficulty of learning long-term dependencies. 2001.
[12] Kyunghyun Cho et al. “Learning phrase representations using
RNN encoder-decoder for statistical machine translation”. In:
arXiv preprint arXiv:1406.1078 (2014).
[13] Ronald A Rensink. “The dynamic representation of scenes”. In:
Visual cognition 7.1-3 (2000), pp. 17–42.
[14] Misha Denil, Loris Bazzani, Hugo Larochelle, and
Nando de Freitas. “Learning where to attend with deep
architectures for image tracking”. In: Neural computation 24.8
(2012), pp. 2151–2184.
[15] Kelvin Xu et al. “Show, attend and tell: Neural image caption
generation with visual attention”. In: arXiv preprint
arXiv:1502.03044 2.3 (2015), p. 5.
22
[16] Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. “Recurrent
models of visual attention”. In: Advances in Neural Information
Processing Systems. 2014, pp. 2204–2212.
[17] Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al.
“End-to-end memory networks”. In: Advances in neural
information processing systems. 2015, pp. 2440–2448.
[18] Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural turing
machines”. In: arXiv preprint arXiv:1410.5401 (2014).
[19] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio.
“Neural machine translation by jointly learning to align and
translate”. In: arXiv preprint arXiv:1409.0473 (2014).
22

More Related Content

What's hot

집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델
guesta34d441
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
Shuai Zhang
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
Ritesh Sawant
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
Memory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question AnsweringMemory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question Answering
Akram El-Korashy
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
hyunsung lee
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal Learning
Marc Bolaños Solà
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
Thi K. Tran-Nguyen, PhD
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
Chun-Ming Chang
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
TELKOMNIKA JOURNAL
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Universitat Politècnica de Catalunya
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
Thomas da Silva Paula
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
David Rostcheck
 

What's hot (20)

집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Memory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question AnsweringMemory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question Answering
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal Learning
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 

Viewers also liked

How to get media attention for your startup
How to get media attention for your startupHow to get media attention for your startup
How to get media attention for your startup
Press42
 
Concevoir une page d\'accueil effcace
Concevoir une page d\'accueil effcaceConcevoir une page d\'accueil effcace
Concevoir une page d\'accueil effcace
Renaud Biemans
 
La gestion de projets interactifs
La gestion de projets interactifsLa gestion de projets interactifs
La gestion de projets interactifs
Benjamin Hoguet
 
Recipes for PhD
Recipes for PhDRecipes for PhD
Recipes for PhD
Milad Shokouhi
 
Design de service - le parcours utilisateur
Design de service - le parcours utilisateurDesign de service - le parcours utilisateur
Design de service - le parcours utilisateur
Eva Villebrun
 
10 méthodes UX appliquées à votre projet Web
10 méthodes UX appliquées à votre projet Web10 méthodes UX appliquées à votre projet Web
10 méthodes UX appliquées à votre projet Web
Core-Techs
 
HISTOIRE ET PANORAMA DU WEB À DESTINATION DES PROFESSIONNELS DE L'IMAGE ET DE...
HISTOIRE ET PANORAMA DU WEB À DESTINATION DES PROFESSIONNELS DE L'IMAGE ET DE...HISTOIRE ET PANORAMA DU WEB À DESTINATION DES PROFESSIONNELS DE L'IMAGE ET DE...
HISTOIRE ET PANORAMA DU WEB À DESTINATION DES PROFESSIONNELS DE L'IMAGE ET DE...
Nadia Berg
 
Présentation de User Studio à l'Observatoire des Politiques Culturelles, Gren...
Présentation de User Studio à l'Observatoire des Politiques Culturelles, Gren...Présentation de User Studio à l'Observatoire des Politiques Culturelles, Gren...
Présentation de User Studio à l'Observatoire des Politiques Culturelles, Gren...
User Studio
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
Xavier Amatriain
 

Viewers also liked (9)

How to get media attention for your startup
How to get media attention for your startupHow to get media attention for your startup
How to get media attention for your startup
 
Concevoir une page d\'accueil effcace
Concevoir une page d\'accueil effcaceConcevoir une page d\'accueil effcace
Concevoir une page d\'accueil effcace
 
La gestion de projets interactifs
La gestion de projets interactifsLa gestion de projets interactifs
La gestion de projets interactifs
 
Recipes for PhD
Recipes for PhDRecipes for PhD
Recipes for PhD
 
Design de service - le parcours utilisateur
Design de service - le parcours utilisateurDesign de service - le parcours utilisateur
Design de service - le parcours utilisateur
 
10 méthodes UX appliquées à votre projet Web
10 méthodes UX appliquées à votre projet Web10 méthodes UX appliquées à votre projet Web
10 méthodes UX appliquées à votre projet Web
 
HISTOIRE ET PANORAMA DU WEB À DESTINATION DES PROFESSIONNELS DE L'IMAGE ET DE...
HISTOIRE ET PANORAMA DU WEB À DESTINATION DES PROFESSIONNELS DE L'IMAGE ET DE...HISTOIRE ET PANORAMA DU WEB À DESTINATION DES PROFESSIONNELS DE L'IMAGE ET DE...
HISTOIRE ET PANORAMA DU WEB À DESTINATION DES PROFESSIONNELS DE L'IMAGE ET DE...
 
Présentation de User Studio à l'Observatoire des Politiques Culturelles, Gren...
Présentation de User Studio à l'Observatoire des Politiques Culturelles, Gren...Présentation de User Studio à l'Observatoire des Politiques Culturelles, Gren...
Présentation de User Studio à l'Observatoire des Politiques Culturelles, Gren...
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 

Similar to Iterative Multi-document Neural Attention for Multiple Answer Prediction

EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
ijaia
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
NAVER Engineering
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Thesis Proposal Presentation
Thesis Proposal PresentationThesis Proposal Presentation
Thesis Proposal Presentation
Turki Alshammary (CBAP, PMI-PBA, PMP)
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
ijsc
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
ijsc
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
IRJET Journal
 
Chounta avouris arv2011
Chounta avouris arv2011Chounta avouris arv2011
Chounta avouris arv2011
Irene-Angelica Chounta
 
Automatic Selection of Open Source Multimedia Softwares Using Error Back-Prop...
Automatic Selection of Open Source Multimedia Softwares Using Error Back-Prop...Automatic Selection of Open Source Multimedia Softwares Using Error Back-Prop...
Automatic Selection of Open Source Multimedia Softwares Using Error Back-Prop...
IJERA Editor
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
Yogendra Tamang
 
2-IJCSE-00536
2-IJCSE-005362-IJCSE-00536
2-IJCSE-00536
Boshra Albayaty
 
2-IJCSE-00536
2-IJCSE-005362-IJCSE-00536
2-IJCSE-00536
Boshra Albayaty
 
My experiment
My experimentMy experiment
My experiment
Boshra Albayaty
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
Andres Mendez-Vazquez
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
SaravanaD2
 
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Cagatay Turkay
 
Evaluating Machine Learning Approaches to Classify Pharmacy Students’ Reflect...
Evaluating Machine Learning Approaches to Classify Pharmacy Students’ Reflect...Evaluating Machine Learning Approaches to Classify Pharmacy Students’ Reflect...
Evaluating Machine Learning Approaches to Classify Pharmacy Students’ Reflect...
mingliu107
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
Tao He
 
Using Physiological sensing and scene reconstruction in remote collaboration
Using Physiological sensing and scene reconstruction in remote collaborationUsing Physiological sensing and scene reconstruction in remote collaboration
Using Physiological sensing and scene reconstruction in remote collaboration
University of Auckland
 

Similar to Iterative Multi-document Neural Attention for Multiple Answer Prediction (20)

EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Thesis Proposal Presentation
Thesis Proposal PresentationThesis Proposal Presentation
Thesis Proposal Presentation
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
 
Chounta avouris arv2011
Chounta avouris arv2011Chounta avouris arv2011
Chounta avouris arv2011
 
Automatic Selection of Open Source Multimedia Softwares Using Error Back-Prop...
Automatic Selection of Open Source Multimedia Softwares Using Error Back-Prop...Automatic Selection of Open Source Multimedia Softwares Using Error Back-Prop...
Automatic Selection of Open Source Multimedia Softwares Using Error Back-Prop...
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
2-IJCSE-00536
2-IJCSE-005362-IJCSE-00536
2-IJCSE-00536
 
2-IJCSE-00536
2-IJCSE-005362-IJCSE-00536
2-IJCSE-00536
 
My experiment
My experimentMy experiment
My experiment
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
 
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
 
Evaluating Machine Learning Approaches to Classify Pharmacy Students’ Reflect...
Evaluating Machine Learning Approaches to Classify Pharmacy Students’ Reflect...Evaluating Machine Learning Approaches to Classify Pharmacy Students’ Reflect...
Evaluating Machine Learning Approaches to Classify Pharmacy Students’ Reflect...
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Using Physiological sensing and scene reconstruction in remote collaboration
Using Physiological sensing and scene reconstruction in remote collaborationUsing Physiological sensing and scene reconstruction in remote collaboration
Using Physiological sensing and scene reconstruction in remote collaboration
 

Recently uploaded

"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Recently uploaded (20)

"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

Iterative Multi-document Neural Attention for Multiple Answer Prediction

  • 1. Iterative Multi-document Neural Attention for Multiple Answer Prediction URANIA Workshop Genova (Italy), November, 28th, 2016 Claudio Greco, Alessandro Suglia, Pierpaolo Basile, Gaetano Rossiello and Giovanni Semeraro Work supported by the IBM Faculty Award ”Deep Learning to boost Cognitive Question Answering” Titan X GPU used for this research donated by the NVIDIA Corporation 1
  • 2. Overview 1. Motivation 2. Methodology 3. Experimental evaluation 4. Conclusions and Future Work 5. Appendix 2
  • 4. Motivation • People have information needs of varying complexity such as: • simple questions about common facts (Question Answering) • suggest movie to watch for a romantic evening (Recommendation) • An intelligent agent able to answer questions formulated in a proper way can solve them, eventually considering: • user context • user preferences Idea In a scenario in which the user profile can be represented by a question, intelligent agents able to answer questions can be used to find the most appealing items for a given user 3
  • 5. Motivation Conversational Recommender Systems (CRS) Assist online users in their information-seeking and decision making tasks by supporting an interactive process [1] which could be goal oriented with the task of starting general and, through a series of interaction cycles, narrowing down the user interests until the desired item is obtained [2]. [1]: T. Mahmood and F. Ricci. “Improving recommender systems with adaptive conversational strategies”. In: Proceedings of the 20th ACM conference on Hypertext and hypermedia. ACM. 2009. [2]: N. Rubens et al. “Active learning in recommender systems”. In: Recommender Systems Handbook. Springer, 2015. 4
  • 7. Building blocks for a CRS According to our vision, to implement a CRS we should design the following building blocks: 1. Question Answering + recommendation 2. Answer explanation 3. Dialog manager Our work called “Iterative Multi-document Neural Attention for Multiple Answer Prediction” tries to tackle building block 1. 5
  • 8. Iterative Multi-document Neural Attention for Multi Answer Prediction The key contributions of this work are the following: 1. We extend the model reported in [3] to let the inference process exploit evidences observed in multiple documents 2. We design a model able to leverage the attention weights generated by the inference process to provide multiple answers 3. We assess the efficacy of our model through an experimental evaluation on the Movie Dialog [4] dataset [3]: A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating Neural Attention for Machine Reading”. In: arXiv preprint arXiv:1606.02245 (2016) [4]: J. Dodge et al. “Evaluating prerequisite qualities for learning end-to-end dialog systems”. In: arXiv preprint arXiv:1511.06931 (2015). 6
  • 9. Iterative Multi-document Neural Attention for Multi Answer Prediction Given a query q, ψ : Q → D produces the set of documents relevant for q, where Q is the set of all queries and D is the set of all documents. Our model defines a workflow in which a sequence of inference steps are performed: 1. Encoding phase 2. Inference phase • Query attentive read • Document attentive read • Gating search results 3. Prediction phase 7
  • 10. Encoding phase Both queries and documents are represented by a sequence of words X = (x1, x2, . . . , x|X|), drawn from a vocabulary V. Each word is represented by a continuous d-dimensional word embedding x ∈ Rd stored in a word embedding matrix X ∈ R|V|×d . Documents and query are encoded using a bidirectional recurrent neural network with Gated Recurrent Units (GRU) as in [3]. Differently from [3], we build a unique representation for the whole set of documents related to the query by stacking each document token representations given by the bidirectional GRU. [3]: A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating Neural Attention for Machine Reading”. In: arXiv preprint arXiv:1606.02245 (2016) 8
  • 11. Inference phase This phase uncovers a possible inference chain which models meaningful relationships between the query and the set of related documents. The inference chain is obtained by performing, for each timestep t = 1, 2, . . . , T, the attention mechanisms given by the query attentive read and the document attentive read. • query attentive read: performs an attention mechanism over the query at inference step t conditioned by the inference state • document attentive read: performs an attention mechanism over the documents at inference step t conditioned by the refined query representation and the inference state • gating search results: updates the inference state in order to retain useful information for the inference process about query and documents and forget useless one 9
  • 12. Inference phase [3]: A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating Neural Attention for Machine Reading”. In: arXiv preprint arXiv:1606.02245 (2016) 10
  • 13. Prediction phase • Leverages document attention weights computed at the inference step t to generate a relevance score for each candidate answer • Relevance scores for each token coming from the l different documents Dq related to the query q are accumulated score(w) = 1 π(w) l∑ i=1 ϕ(i, w) where: • ϕ(i, w) returns the score associated to the word w in document i • π(w) returns the frequency of the word w in Dq 11
  • 14. Prediction phase • A 2-layer feed-forward neural network is used to learn latent relationships between tokens in documents • The output layer of the neural network generates a score for each candidate answer using a sigmoid activation function z = [score(w1), score(w2), . . . , score(w|V|)] y = sigmoid(Who relu((Wihz + bih)) + bho) where: • u is the hidden layer size • Wih ∈ R|V|×u , Woh ∈ R|A|×u are weight matrices • bih ∈ Ru , bho ∈ R|A| are bias vectors • sigmoid(x) = 1 1+e−x is the sigmoid function • relu(x) = max(0, x) is the ReLU activation function 12
  • 16. Movie Dialog bAbI Movie Dialog [4] dataset, composed by different tasks such as: • factoid QA (QA) • top-n recommendation (Recs) • QA+recommendation in a dialog fashion • Turns of dialogs taken from Reddit [4]: J. Dodge et al. “Evaluating prerequisite qualities for learning end-to-end dialog systems”. In: arXiv preprint arXiv:1511.06931 (2015). 13
  • 17. Experimental evaluation • Differently from [4], the relevant knowledge base facts, represented in triple from, are retrieved by ψ implemented using Elasticsearch engine • Evaluation metrics: • QA task: HITS@1 • Recs task: HITS@100 • The optimization method and tricks are adopted from [3] • The model is implemented in TensorFlow [5] and executed on an NVIDIA TITAN X GPU [3]: A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating Neural Attention for Machine Reading”. In: arXiv preprint arXiv:1606.02245 (2016) [4]: J. Dodge et al. “Evaluating prerequisite qualities for learning end-to-end dialog systems”. In: arXiv preprint arXiv:1511.06931 (2015). [5]: M. Abadi et al. “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems”. In: CoRR abs/1603.04467 (2016). 14
  • 18. Experimental evaluation METHODS QA TASK RECS TASK QA SYSTEM 90.7 N/A SVD N/A 19.2 IR N/A N/A LSTM 6.5 27.1 SUPERVISED EMBEDDINGS 50.9 29.2 MEMN2N 79.3 28.6 JOINT SUPERVISED EMBEDDINGS 43.6 28.1 JOINT MEMN2N 83.5 26.5 OURS 86.8 30 Table 1: Comparison between our model and baselines from [4] on the QA and Recs tasks evaluated according to HITS@1 and HITS@100, respectively. [4]: J. Dodge et al. “Evaluating prerequisite qualities for learning end-to-end dialog systems”. In: arXiv preprint arXiv:1511.06931 (2015). 15
  • 19. Inference phase attention weights Question: what does Larenz Tate act in ? Ground truth answers: The Postman, A Man Apart, Dead Presidents, Love Jones, Why Do Fools Fall in Love, The Inkwell Most relevant sentences: • The Inkwell starred actors Joe Morton , Larenz Tate , Suzzanne Douglas , Glynn Turman • Love Jones starred actors Nia Long , Larenz Tate , Isaiah Washington , Lisa Nicole Carson • Why Do Fools Fall in Love starred actors Halle Berry , Vivica A. Fox , Larenz Tate , Lela Rochon • The Postman starred actors Kevin Costner , Olivia Williams , Will Patton , Larenz Tate • Dead Presidents starred actors Keith David , Chris Tucker , Larenz Tate • A Man Apart starred actors Vin Diesel , Larenz Tate Figure 1: Attention weights computed by the neural network attention mechanisms at the last inference step T for each token. Higher shades correspond to higher relevance scores for the related tokens. 16
  • 21. Pros and Cons Pros • Huge gap between our model and all the other baselines • Fully general model able to extract relevant information from a generic document collection • Learns latent relationships between document tokens thanks to the feed-forward neural network in the prediction phase • Provides multiple answers for a given question Cons • Still not satisfying performance on the Recs task • Issues in the Recs task dataset according to [6] [6]: R. Searle and M. Bingham-Walker. “Why “Blow Out”? A Structural Analysis of the Movie Dialog Dataset”. In: ACL 2016 (2016) 17
  • 22. Future Work • Design a ψ operator able to return relevant facts recognizing the most relevant information in the query • Exploit user preferences and contextual information to learn the user model • Provide a mechanism which leverages attention weights to give explanations [7] • Collect dialog data with user information and feedback • Design of a framework for dialog management based on Reinforcement Learning [8] [7]: B. Goodman and S. Flaxman. “European Union regulations on algorithmic decision-making and a ”right to explanation””. In: arXiv preprint arXiv:1606.08813 (2016). [8]: R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. Vol. 1. 1. MIT press Cambridge, 1998 18
  • 24. Recurrent Neural Networks • Recurrent Neural Networks (RNN) are architectures suitable to model variable-length sequential data [9]; • The connections between their units may contain loops which let them consider past states in the learning process; • Their roots are in the Dynamical System Theory in which the following relation is true: s(t) = f(s(t−1) ; x(t) ; θ) where s(t) represents the current system state computed by a generic function f evaluated on the previous state s(t−1) , x(t) represents the current input and θ are the network parameters. [9] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. Tech. rep. DTIC Document, 1985 19
  • 25. RNN pros and cons Pros • Appropriate to represent sequential data; • A versatile framework which can be applied to different tasks; • Can learn short-term and long-term temporal dependencies. Cons • Vanishing/exploding gradient problem [10, 11]; • Difficulties to reach satisfying minima during the optimization of the loss function; • Difficult to parallelize the training process. [10] Y. Bengio, P. Simard, and P. Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: Neural Networks, IEEE Transactions on 5 (1994) [11] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. 2001. 20
  • 26. Gated Recurrent Unit Gated Recurrent Unit (GRU) [12] is a special kind of RNN cell which tries to solve the vanishing/exploding gradient problem. GRU description taken from https://goo.gl/gJe8jZ. [12] K. Cho et al. “Learning phrase representations using RNN encoder-decoder for statistical machine translation”. In: arXiv preprint arXiv:1406.1078 (2014). 21
  • 27. Attention mechanism • Mechanism inspired by the way the human brain is able to focus on relevant aspects of a dynamic scene and supported by studies in visual cognition [13]; • Neural networks equipped with an attention mechanism are able to learn relevant parts of an input representation for a specific task; • Attention mechanisms in Deep Learning techniques has incredibly boosted performance in a lot of different tasks such as Computer Vision [14–16], Question Answering [17, 18] and Machine Translation [19]. 22
  • 29. [1] T. Mahmood and F. Ricci. “Improving recommender systems with adaptive conversational strategies”. In: Proceedings of the 20th ACM conference on Hypertext and hypermedia. ACM. 2009, pp. 73–82. [2] N. Rubens, M. Elahi, M. Sugiyama, and D. Kaplan. “Active learning in recommender systems”. In: Recommender Systems Handbook. Springer, 2015, pp. 809–846. [3] A. Sordoni, P. Bachman, and Y. Bengio. “Iterative Alternating Neural Attention for Machine Reading”. In: arXiv preprint arXiv:1606.02245 (2016). [4] J. Dodge et al. “Evaluating prerequisite qualities for learning end-to-end dialog systems”. In: arXiv preprint arXiv:1511.06931 (2015). [5] Martı́n Abadi et al. “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems”. In: CoRR abs/1603.04467 (2016). url: http://arxiv.org/abs/1603.04467. 22
  • 30. [6] R. Searle and M. Bingham-Walker. “Why “Blow Out”? A Structural Analysis of the Movie Dialog Dataset”. In: ACL 2016 (2016), p. 215. [7] Bryce Goodman and Seth Flaxman. “European Union regulations on algorithmic decision-making and a” right to explanation””. In: arXiv preprint arXiv:1606.08813 (2016). [8] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. Vol. 1. 1. MIT press Cambridge, 1998. [9] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning internal representations by error propagation. Tech. rep. DTIC Document, 1985. [10] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: IEEE transactions on neural networks 5.2 (1994), pp. 157–166. 22
  • 31. [11] Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. 2001. [12] Kyunghyun Cho et al. “Learning phrase representations using RNN encoder-decoder for statistical machine translation”. In: arXiv preprint arXiv:1406.1078 (2014). [13] Ronald A Rensink. “The dynamic representation of scenes”. In: Visual cognition 7.1-3 (2000), pp. 17–42. [14] Misha Denil, Loris Bazzani, Hugo Larochelle, and Nando de Freitas. “Learning where to attend with deep architectures for image tracking”. In: Neural computation 24.8 (2012), pp. 2151–2184. [15] Kelvin Xu et al. “Show, attend and tell: Neural image caption generation with visual attention”. In: arXiv preprint arXiv:1502.03044 2.3 (2015), p. 5. 22
  • 32. [16] Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. “Recurrent models of visual attention”. In: Advances in Neural Information Processing Systems. 2014, pp. 2204–2212. [17] Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. “End-to-end memory networks”. In: Advances in neural information processing systems. 2015, pp. 2440–2448. [18] Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural turing machines”. In: arXiv preprint arXiv:1410.5401 (2014). [19] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate”. In: arXiv preprint arXiv:1409.0473 (2014). 22