Deep learning enabled Question
Answering models
PROJECT WORK PRESENTATION
Saurabh Saxena
2015HT12604
Introduction
DEEP LEARNING AND QUESTION ANSWERING SYSTEMS
Deep learning
 What is Deep learning?
Deep learning is a new area of Machine learning research that uses multi-
layered Artificial neural networks. The objective is to learn multiple levels of
representation and abstraction that help to make sense of data such as images,
sound, and text. It is and is becoming increasingly relevant because of three
key reasons :
 An infinitely flexible function – universal function approximation via Neural networks
 All-purpose parameter fitting – using gradient descent and its derivative algorithms
 Fast and scalable – availability of cheap GPUs for fast matrix multiplications
 Typical applications of Deep learning
 Convolution Neural Networks(CNN) in Computer vision and machine translation
 Recurrent Neural Network(RNN) like LSTM/GRU in language modeling
 Tree Neural Networks(TNN) in sentiment analysis
 Reinforcement learning in Game playing and intelligent agents
4
Basic Building blocks of deep learning
Most DL Networks (including Question Answering models) are composed
out of these basic building blocks:
• Fully Connected Network
• Word Embedding
• Convolutional Neural Network
• Recurrent Neural Network
General Architecture of a Deep model
 What is a Question Answering System?
The basic idea of an automated QA system is to extract information from
documents and given a user query provide a short and concise answer that will
meet user’s information needs.
 Traditional QA systems are basically of 2 types :
 Information Retrieval(IR) based QA – Match and ranking based broad domain
QA using mostly unstructured data, example -> Search engines
 Knowledge-based(KB) QA – semantic representation of query using structured
data like triple stores or SQL example -> Freebase , DBPedia, and Wolfram
alpha
 Question types
 Factoid questions – DeepMind CNN/DailyMail datset
 Cloze style questions – MCTest dataset and bAbI
 Open domain question answering – WikiQA and LAMBADA
QA systems
QA scenarios
QA scenarios
QA scenarios
QA scenarios
Motivations – What deep learning can
do for QA systems ?
 Traditional QA pipeline relies a lot on manual feature engineering. The aim of
deep learning models is to eliminate this.
 Aim to build systems that can directly read documents and then answer
questions based on those documents.
 RNNs have been successful in language modeling and generation but could
not achieve much success in QA as they cannot store enough context in their
hidden states . To answer complex questions models require supporting facts
far back in the past.
 Suffer from vanishing gradient problem if too many time-steps are used.
 Solution - incorporate explicit Memory in the model and a way to address
that memory for read and write.
Memory networks for QA
AND THEIR VARIANTS
What are Memory Networks ?
 Class of models that combine large memory with learning component that
can read and write to it.
 Incorporates reasoning with attention over memory (RAM).
 Most ML has limited memory which is more-or-less all that’s needed for
“low level” tasks e.g. object detection.
 Long-term memory is required to read a story and then e.g. answer
questions about it.
 It is also required for dialog: to remember previous dialog (short- and
long-term), and respond.
 Models are scalable - can store and read large amount of data in memory
- entire KB
All MemNN have four component networks (which may or
may not have shared parameters):
 I: (input feature map) convert incoming data to the internal feature
representation.
 G: (generalization) update memories given new input.
 O: produce new output (in feature representation space) given the
memories.
 R: (response) convert output O into response seen by the outside world
Step 1: controller converts incoming data to internal
feature representation (I)
Step 2: write head updates the memories and writes the data
into memory (G)
Step 3: given the external input, the read head reads
the memory and fetches relevant data (O)
Step 4: controller combines the external data with
memory contents returned by read head to generate
output (O, R)
State-of-the art Memory Networks
Datasets to train Deep QA models
BABI , LAMBADA , MCTEST AND MORE…
Datasets available to train/test QA
models
 Facebook bAbI Simplequestions– A set of 20 tasks for testing text understanding
and reasoning. For each task, there are 10000 questions for training, and 1000 for
testing. Each task tests the machine on a specific skill set.
https://research.fb.com/downloads/babi/
 Facebook bAbI Chidlren's Book Test(CBT)- Text passages and corresponding
questions drawn from Project Gutenberg Children's books. 669,343 training
questions , 8,000 dev questions and 10,000 test questions
 MCTest - consists of 500 stories and 2000 questions. By being fictional, the answer
typically can be found only in the story itself. Requires machines to answer
multiple-choice reading comprehension questions about fictional stories, directly
tackling the high-level goal of open-domain machine comprehension.
http://research.microsoft.com/en-us/um/redmond/projects/mctest/
 Language Modeling Broadened to Account for Discourse Aspects(LAMBADA
dataset) - consists of 10,022 passages, divided into 4,869 development and 5,153
test passages (extracted from 1,331 and 1,332 disjoint novels, respectively). The
average passage consists of 4.6 sentences in the context plus 1 target sentence, for
a total length of 75.4 tokens (dev) / 75 tokens (test).
http://clic.cimec.unitn.it/lambada/
 DeepMind CNN and DailyMail dataset - Collection of news articles and
corresponding cloze queriesEach dataset contains many documents (90k and 197k
each), and each document has on average 4 questions approximately. Each
question is a sentence with one missing word/phrase which can be found from the
accompanying document/context
http://cs.nyu.edu/~kcho/DMQA/
 Stanford Question answering Dataset (SQuAD) - reading comprehension dataset
consisting of questions posed by crowd-workers on a set of Wikipedia articles. The
answer to every question is a segment of text, or span, from the corresponding
reading passage. There are 100,000+ question-answer pairs on 500+ articles.
https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/
 AI2 Science Exams - Elementary science questions from US state and regional
science exam. 170 multi-state and 108 4th grade questions.
http://allenai.org/data/science-exam-questions.html
 WikiQA - 3047 questions sampled from Bing query logs. Each question associated
with a Wikipedia page. All sentences in the summary paragraph of the page
become the candidate answers. Only 1/3rd questions have a correct answer in the
candidate answer set.
https://www.microsoft.com/en-us/research/publication/wikiqa-a-challenge-
dataset-for-open-domain-question-answering/
Facebook bAbI dataset – 20 tasks
• Single supporting fact
• Two supporting facts
• Three supporting facts
• Two argument relations
• Three argument relations
• Yes/No questions
• Counting
• Lists/sets
• Simple Negation
• Indefinite Knowledge
• Basic Coreference
• Conjunction
• Compound Coreference
• Time Reasoning
• Basic Deduction
• Basic Induction
• Positional Reasoning
• Size Reasoning
• Path Finding
• Agent’s Motivations
20 tasks in brief..
End-to-End MemNN
Dynamic MemNN
Key-value MemNN Architecture
Experimental Setup to train deep
models
GPU, THEANO, KERAS , CUDA , CUDNN AND MORE…
Component Description
Operating System Ubuntu 16.04 VM on Intel Octa core CPU with 6.5 GB RAM
Graphics Card NVIDIDA Testla K80 with 12 GB Ram and 2080 CUDA cores
Graphics Toolkit CUDA 8.0 with CuDNN 6.0
Python Package Manager Anaconda (Continuum Analytics) for Python 2.7
Deep learning library Keras v2.0.2
with Theano v0.9.0 backend
Other python modules  Bcolz v1.0.0 for fast saving/loading of trained weights
 Numpy v1.12.1 for all multi-dimensional numeric manipulations
 Scikit-learn v0.18.1 for preprocessing, pipelining, feature-extraction, decomposition , dataset
splits and all general non-deep machine algorithms
 Cpickle for saving model
 NLTK toolkit for traditional linguistic tasks
 Matplotlib v2.0.0 – for visualizing data
 Pydot v1.0.28 and GraphViz v2.38.0– for visualizing deep models
 Openblas 0.2.19 – for fast linear algebra operations
 Pandas v0.19.2 for structured data manipulation
 Protobuf 3.0.0 for protocol buffering
 Flask v0.12 for web display
Experimental setup in Google Cloud
Compute Engine setup in Google Cloud
GPU details
Training Summary
MODELS, TEST ACCURACY AND MORE…
Model summary for bAbi Task#1
Training summary for bAbI Task#1 – one supporting fact
Training summary for bAbI Task#2 – 2 supporting fact
Joint training on all 20 tasks simultaneously
Demo on bAbi tasks -
Correct answers
Demo – Incorrect answer
Future work
 Train Dynamic Memory network on bAbi dataset
 Train Key-value memory network on bAbi dataset
 Evaluate the performance of current models on other datasets like
LAMBADA and Stanford SQUAD
 Explore the possibility of transfer learning so that models trained on open
source datasets can be applied to corporate datasets with only fine tuning
 Explore the use of trained models in dialog modeling for Helpdesk
Question answering
Thanks

Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk

  • 1.
    Deep learning enabledQuestion Answering models PROJECT WORK PRESENTATION Saurabh Saxena 2015HT12604
  • 2.
    Introduction DEEP LEARNING ANDQUESTION ANSWERING SYSTEMS
  • 3.
    Deep learning  Whatis Deep learning? Deep learning is a new area of Machine learning research that uses multi- layered Artificial neural networks. The objective is to learn multiple levels of representation and abstraction that help to make sense of data such as images, sound, and text. It is and is becoming increasingly relevant because of three key reasons :  An infinitely flexible function – universal function approximation via Neural networks  All-purpose parameter fitting – using gradient descent and its derivative algorithms  Fast and scalable – availability of cheap GPUs for fast matrix multiplications  Typical applications of Deep learning  Convolution Neural Networks(CNN) in Computer vision and machine translation  Recurrent Neural Network(RNN) like LSTM/GRU in language modeling  Tree Neural Networks(TNN) in sentiment analysis  Reinforcement learning in Game playing and intelligent agents
  • 4.
    4 Basic Building blocksof deep learning Most DL Networks (including Question Answering models) are composed out of these basic building blocks: • Fully Connected Network • Word Embedding • Convolutional Neural Network • Recurrent Neural Network
  • 5.
  • 6.
     What isa Question Answering System? The basic idea of an automated QA system is to extract information from documents and given a user query provide a short and concise answer that will meet user’s information needs.  Traditional QA systems are basically of 2 types :  Information Retrieval(IR) based QA – Match and ranking based broad domain QA using mostly unstructured data, example -> Search engines  Knowledge-based(KB) QA – semantic representation of query using structured data like triple stores or SQL example -> Freebase , DBPedia, and Wolfram alpha  Question types  Factoid questions – DeepMind CNN/DailyMail datset  Cloze style questions – MCTest dataset and bAbI  Open domain question answering – WikiQA and LAMBADA QA systems
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Motivations – Whatdeep learning can do for QA systems ?  Traditional QA pipeline relies a lot on manual feature engineering. The aim of deep learning models is to eliminate this.  Aim to build systems that can directly read documents and then answer questions based on those documents.  RNNs have been successful in language modeling and generation but could not achieve much success in QA as they cannot store enough context in their hidden states . To answer complex questions models require supporting facts far back in the past.  Suffer from vanishing gradient problem if too many time-steps are used.  Solution - incorporate explicit Memory in the model and a way to address that memory for read and write.
  • 12.
    Memory networks forQA AND THEIR VARIANTS
  • 13.
    What are MemoryNetworks ?  Class of models that combine large memory with learning component that can read and write to it.  Incorporates reasoning with attention over memory (RAM).  Most ML has limited memory which is more-or-less all that’s needed for “low level” tasks e.g. object detection.  Long-term memory is required to read a story and then e.g. answer questions about it.  It is also required for dialog: to remember previous dialog (short- and long-term), and respond.  Models are scalable - can store and read large amount of data in memory - entire KB
  • 14.
    All MemNN havefour component networks (which may or may not have shared parameters):  I: (input feature map) convert incoming data to the internal feature representation.  G: (generalization) update memories given new input.  O: produce new output (in feature representation space) given the memories.  R: (response) convert output O into response seen by the outside world Step 1: controller converts incoming data to internal feature representation (I) Step 2: write head updates the memories and writes the data into memory (G) Step 3: given the external input, the read head reads the memory and fetches relevant data (O) Step 4: controller combines the external data with memory contents returned by read head to generate output (O, R)
  • 15.
  • 16.
    Datasets to trainDeep QA models BABI , LAMBADA , MCTEST AND MORE…
  • 17.
    Datasets available totrain/test QA models  Facebook bAbI Simplequestions– A set of 20 tasks for testing text understanding and reasoning. For each task, there are 10000 questions for training, and 1000 for testing. Each task tests the machine on a specific skill set. https://research.fb.com/downloads/babi/  Facebook bAbI Chidlren's Book Test(CBT)- Text passages and corresponding questions drawn from Project Gutenberg Children's books. 669,343 training questions , 8,000 dev questions and 10,000 test questions  MCTest - consists of 500 stories and 2000 questions. By being fictional, the answer typically can be found only in the story itself. Requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension. http://research.microsoft.com/en-us/um/redmond/projects/mctest/
  • 18.
     Language ModelingBroadened to Account for Discourse Aspects(LAMBADA dataset) - consists of 10,022 passages, divided into 4,869 development and 5,153 test passages (extracted from 1,331 and 1,332 disjoint novels, respectively). The average passage consists of 4.6 sentences in the context plus 1 target sentence, for a total length of 75.4 tokens (dev) / 75 tokens (test). http://clic.cimec.unitn.it/lambada/  DeepMind CNN and DailyMail dataset - Collection of news articles and corresponding cloze queriesEach dataset contains many documents (90k and 197k each), and each document has on average 4 questions approximately. Each question is a sentence with one missing word/phrase which can be found from the accompanying document/context http://cs.nyu.edu/~kcho/DMQA/
  • 19.
     Stanford Questionanswering Dataset (SQuAD) - reading comprehension dataset consisting of questions posed by crowd-workers on a set of Wikipedia articles. The answer to every question is a segment of text, or span, from the corresponding reading passage. There are 100,000+ question-answer pairs on 500+ articles. https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/  AI2 Science Exams - Elementary science questions from US state and regional science exam. 170 multi-state and 108 4th grade questions. http://allenai.org/data/science-exam-questions.html  WikiQA - 3047 questions sampled from Bing query logs. Each question associated with a Wikipedia page. All sentences in the summary paragraph of the page become the candidate answers. Only 1/3rd questions have a correct answer in the candidate answer set. https://www.microsoft.com/en-us/research/publication/wikiqa-a-challenge- dataset-for-open-domain-question-answering/
  • 20.
    Facebook bAbI dataset– 20 tasks • Single supporting fact • Two supporting facts • Three supporting facts • Two argument relations • Three argument relations • Yes/No questions • Counting • Lists/sets • Simple Negation • Indefinite Knowledge • Basic Coreference • Conjunction • Compound Coreference • Time Reasoning • Basic Deduction • Basic Induction • Positional Reasoning • Size Reasoning • Path Finding • Agent’s Motivations
  • 21.
    20 tasks inbrief..
  • 24.
  • 25.
  • 26.
  • 27.
    Experimental Setup totrain deep models GPU, THEANO, KERAS , CUDA , CUDNN AND MORE…
  • 28.
    Component Description Operating SystemUbuntu 16.04 VM on Intel Octa core CPU with 6.5 GB RAM Graphics Card NVIDIDA Testla K80 with 12 GB Ram and 2080 CUDA cores Graphics Toolkit CUDA 8.0 with CuDNN 6.0 Python Package Manager Anaconda (Continuum Analytics) for Python 2.7 Deep learning library Keras v2.0.2 with Theano v0.9.0 backend Other python modules  Bcolz v1.0.0 for fast saving/loading of trained weights  Numpy v1.12.1 for all multi-dimensional numeric manipulations  Scikit-learn v0.18.1 for preprocessing, pipelining, feature-extraction, decomposition , dataset splits and all general non-deep machine algorithms  Cpickle for saving model  NLTK toolkit for traditional linguistic tasks  Matplotlib v2.0.0 – for visualizing data  Pydot v1.0.28 and GraphViz v2.38.0– for visualizing deep models  Openblas 0.2.19 – for fast linear algebra operations  Pandas v0.19.2 for structured data manipulation  Protobuf 3.0.0 for protocol buffering  Flask v0.12 for web display Experimental setup in Google Cloud
  • 29.
    Compute Engine setupin Google Cloud
  • 30.
  • 31.
    Training Summary MODELS, TESTACCURACY AND MORE…
  • 32.
    Model summary forbAbi Task#1
  • 33.
    Training summary forbAbI Task#1 – one supporting fact Training summary for bAbI Task#2 – 2 supporting fact
  • 34.
    Joint training onall 20 tasks simultaneously
  • 35.
    Demo on bAbitasks - Correct answers
  • 37.
  • 38.
    Future work  TrainDynamic Memory network on bAbi dataset  Train Key-value memory network on bAbi dataset  Evaluate the performance of current models on other datasets like LAMBADA and Stanford SQUAD  Explore the possibility of transfer learning so that models trained on open source datasets can be applied to corporate datasets with only fine tuning  Explore the use of trained models in dialog modeling for Helpdesk Question answering
  • 39.