Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
A Simple Introduction to Word EmbeddingsBhaskar Mitra
In information retrieval there is a long history of learning vector representations for words. In recent times, neural word embeddings have gained significant popularity for many natural language processing tasks, such as word analogy and machine translation. The goal of this talk is to introduce basic intuitions behind these simple but elegant models of text representation. We will start our discussion with classic vector space models and then make our way to recently proposed neural word embeddings. We will see how these models can be useful for analogical reasoning as well applied to many information retrieval tasks.
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
A Simple Introduction to Word EmbeddingsBhaskar Mitra
In information retrieval there is a long history of learning vector representations for words. In recent times, neural word embeddings have gained significant popularity for many natural language processing tasks, such as word analogy and machine translation. The goal of this talk is to introduce basic intuitions behind these simple but elegant models of text representation. We will start our discussion with classic vector space models and then make our way to recently proposed neural word embeddings. We will see how these models can be useful for analogical reasoning as well applied to many information retrieval tasks.
Fine tune and deploy Hugging Face NLP modelsOVHcloud
Are you currently managing AI projects that require a lot of GPU power?
Are you tired of managing the complexity of your infrastructures, GPU instances and your Kubeflow yourself?
Need flexibility for your AI platform or SaaS solution?
OVHcloud innovates in AI by offering simple and turnkey solutions to train your models and put them into production.
Natural Language Processing (NLP) is often taught at the academic level from the perspective of computational linguists. However, as data scientists, we have a richer view of the world of natural language - unstructured data that by its very nature has important latent information for humans. NLP practitioners have benefitted from machine learning techniques to unlock meaning from large corpora, and in this class we’ll explore how to do that particularly with Python, the Natural Language Toolkit (NLTK), and to a lesser extent, the Gensim Library.
NLTK is an excellent library for machine learning-based NLP, written in Python by experts from both academia and industry. Python allows you to create rich data applications rapidly, iterating on hypotheses. Gensim provides vector-based topic modeling, which is currently absent in both NLTK and Scikit-Learn. The combination of Python + NLTK means that you can easily add language-aware data products to your larger analytical workflows and applications.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
Financial Question Answering with BERT Language ModelsBithiah Yuan
FinBERT-QA is a Question Answering system for retrieving opinionated financial passages from task 2 of the FiQA dataset. The system uses techniques from both information retrieval, natural language processing, and deep learning.
Las aplicaciones de Inteligencia Artificial como Machine Learning y Deep Learning se han convertido en parte importante en nuestras vidas. Los productos que compramos, si somos o no aptos para un préstamo bancario, las películas o series que Netflix nos recomienda, coches autoconducidos, reconocimiento de objetos, etc; toda esa información es dirigida hacia nosotros por estos algoritmos.
En la actualidad, estos campos de estudio son los más apasionantes y retadores en computación debido a su alto nivel de complejidad y gran demanda en el mercado. En esta presentación vamos a conocer y aprender a diferenciar estos conceptos, ya que son herramientas inevitables para el mejoramiento de la vida humana.
A continuación, te presentamos algunos de los temas específicos que se expondrán:
- Contexto de ML y DL en Inteligencia Artificial.
- Machine Learning.
- Supervised Learning.
- Unsupervised Learning.
- Deep Learning.
- Artificial Neural Network.
- Convolutional Neural Networks.
- Aplicaciones en ML y DL.
These slides are an introduction to the understanding of the domain NLP and the basic NLP pipeline that are commonly used in the field of Computational Linguistics.
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...Andrew Gardner
Note: these are the slides from a presentation at Lexis Nexis in Alpharetta, GA, on 2014-01-08 as part of the DataScienceATL Meetup. A video of this talk from Dec 2013 is available on vimeo at http://bit.ly/1aJ6xlt
Note: Slideshare mis-converted the images in slides 16-17. Expect a fix in the next couple of days.
---
Deep learning is a hot area of machine learning named one of the "Breakthrough Technologies of 2013" by MIT Technology Review. The basic ideas extend neural network research from past decades and incorporate new discoveries in statistical machine learning and neuroscience. The results are new learning architectures and algorithms that promise disruptive advances in automatic feature engineering, pattern discovery, data modeling and artificial intelligence. Empirical results from real world applications and benchmarking routinely demonstrate state-of-the-art performance across diverse problems including: speech recognition, object detection, image understanding and machine translation. The technology is employed commercially today, notably in many popular Google products such as Street View, Google+ Image Search and Android Voice Recognition.
In this talk, we will present an overview of deep learning for data scientists: what it is, how it works, what it can do, and why it is important. We will review several real world applications and discuss some of the key hurdles to mainstream adoption. We will conclude by discussing our experiences implementing and running deep learning experiments on our own hardware data science appliance.
Fine tune and deploy Hugging Face NLP modelsOVHcloud
Are you currently managing AI projects that require a lot of GPU power?
Are you tired of managing the complexity of your infrastructures, GPU instances and your Kubeflow yourself?
Need flexibility for your AI platform or SaaS solution?
OVHcloud innovates in AI by offering simple and turnkey solutions to train your models and put them into production.
Natural Language Processing (NLP) is often taught at the academic level from the perspective of computational linguists. However, as data scientists, we have a richer view of the world of natural language - unstructured data that by its very nature has important latent information for humans. NLP practitioners have benefitted from machine learning techniques to unlock meaning from large corpora, and in this class we’ll explore how to do that particularly with Python, the Natural Language Toolkit (NLTK), and to a lesser extent, the Gensim Library.
NLTK is an excellent library for machine learning-based NLP, written in Python by experts from both academia and industry. Python allows you to create rich data applications rapidly, iterating on hypotheses. Gensim provides vector-based topic modeling, which is currently absent in both NLTK and Scikit-Learn. The combination of Python + NLTK means that you can easily add language-aware data products to your larger analytical workflows and applications.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
Financial Question Answering with BERT Language ModelsBithiah Yuan
FinBERT-QA is a Question Answering system for retrieving opinionated financial passages from task 2 of the FiQA dataset. The system uses techniques from both information retrieval, natural language processing, and deep learning.
Las aplicaciones de Inteligencia Artificial como Machine Learning y Deep Learning se han convertido en parte importante en nuestras vidas. Los productos que compramos, si somos o no aptos para un préstamo bancario, las películas o series que Netflix nos recomienda, coches autoconducidos, reconocimiento de objetos, etc; toda esa información es dirigida hacia nosotros por estos algoritmos.
En la actualidad, estos campos de estudio son los más apasionantes y retadores en computación debido a su alto nivel de complejidad y gran demanda en el mercado. En esta presentación vamos a conocer y aprender a diferenciar estos conceptos, ya que son herramientas inevitables para el mejoramiento de la vida humana.
A continuación, te presentamos algunos de los temas específicos que se expondrán:
- Contexto de ML y DL en Inteligencia Artificial.
- Machine Learning.
- Supervised Learning.
- Unsupervised Learning.
- Deep Learning.
- Artificial Neural Network.
- Convolutional Neural Networks.
- Aplicaciones en ML y DL.
These slides are an introduction to the understanding of the domain NLP and the basic NLP pipeline that are commonly used in the field of Computational Linguistics.
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...Andrew Gardner
Note: these are the slides from a presentation at Lexis Nexis in Alpharetta, GA, on 2014-01-08 as part of the DataScienceATL Meetup. A video of this talk from Dec 2013 is available on vimeo at http://bit.ly/1aJ6xlt
Note: Slideshare mis-converted the images in slides 16-17. Expect a fix in the next couple of days.
---
Deep learning is a hot area of machine learning named one of the "Breakthrough Technologies of 2013" by MIT Technology Review. The basic ideas extend neural network research from past decades and incorporate new discoveries in statistical machine learning and neuroscience. The results are new learning architectures and algorithms that promise disruptive advances in automatic feature engineering, pattern discovery, data modeling and artificial intelligence. Empirical results from real world applications and benchmarking routinely demonstrate state-of-the-art performance across diverse problems including: speech recognition, object detection, image understanding and machine translation. The technology is employed commercially today, notably in many popular Google products such as Street View, Google+ Image Search and Android Voice Recognition.
In this talk, we will present an overview of deep learning for data scientists: what it is, how it works, what it can do, and why it is important. We will review several real world applications and discuss some of the key hurdles to mainstream adoption. We will conclude by discussing our experiences implementing and running deep learning experiments on our own hardware data science appliance.
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Transform your Business with AI, Deep Learning and Machine LearningSri Ambati
Video: https://www.youtube.com/watch?v=R3IXd1iwqjc
Meetup: http://www.meetup.com/SF-Bay-ACM/events/231709894/
In this talk, Arno Candel presents a brief history of AI and how Deep Learning and Machine Learning techniques are transforming our everyday lives. Arno will introduce H2O, a scalable open-source machine learning platform, and show live demos on how to train sophisticated machine learning models on large distributed datasets. He will show how data scientists and application developers can use the Flow GUI, R, Python, Java, Scala, JavaScript and JSON to build smarter applications, and how to take them to production. He will present customer use cases from verticals including insurance, fraud, churn, fintech, and marketing.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Relations play a vital role on knowledge construction
and maintenance thereof. They for example connect domain
type entities to range type entities, like the relation born in
connects some Persons to some Places. Over any dataset, the
domain-range information is used to maintain data consistency.
Therefore, we see that knowledge construction frameworks
sometime engage costly Knowledge Engineers to define the
domain-range information in form of a schema or an ontology.
We also see that frameworks that hold such defined domain-range information, often do not follow them strictly. In the worst case some frameworks do not even allow to define a
domain-range, rather they just gather the knowledge entries.
One reason of not defining the domain-range information is
that it is costly. On the other hand, the reason for not following
the domain-range constraint is that the most of them are either
manual or semi-automatic, therefore they face adaptation
difficulty. In this research, we propose a relation-wise machine
learning model that can define and validate domain-range
information automatically. The initial experiment shows that
the proposed framework performs promisingly.
In this talk we cover
1. Why NLP and DL
2. Practical Challenges
3. Some Popular Deep Learning models for NLP
Today you can take any webpage in any language and translate it automatically into language you know! You can also cut paste an article or other document into NLP systems and immediately get list of companies and people it talks about, topics that are relevant and the sentiment of the document. When you talk to Google or Amazon assistant, you are using NLP systems. NLP is not perfect but given the advances in last two years and continuing, it is a growing field. Let’s see how it actually works, specifically using Deep learning
About Shishir
Shishir is a Senior Data Scientist at Thomson Reuters working on Deep Learning and NLP to solve real customer pain, even ones they have become used to.
Improving Search in Workday Products using Natural Language ProcessingDataWorks Summit
Workday is a leading provider of cloud-based enterprise software products such as Human Capital Management, Talent, Finance, Student, Planning etc. These products produce a wealth of natural language data. However, this data is unstructured and denormalized. Retrieving relevant information from such data is a challenging task. Using simple index-based search methods can only take us so far. The Data Science team at Workday is determined to apply Machine Learning and AI to make search better across Workday’s products.
In this session, we present to you, how we use word embeddings to normalize the data and add structure to it. We will also talk about using word representations to make search intelligent. The specific use cases we will discuss are adding synonyms detection and entity-recommendation.
In this talk, we will focus on the word-embeddings techniques explored, metrics used to evaluate Natural Language Processing Models, tools built, and future work as a part of improving search.
Speaker
Namrata Ghadi, Workday Inc, Software Development Engineer (Data Science)
Adam Baker, Workday Inc, Sr Software Engineer
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community MeetingMark Guzdial
Why we should use ebooks (rather than MOOCs) for CS learning opportunities for high school teachers. We use educational psychology principles to design our book. The talk presents data from our first three studies: usability, log file analysis, and learnability
Natural Language Processing: L01 introductionananth
This presentation introduces the course Natural Language Processing (NLP) by enumerating a number of applications, course positioning, challenges presented by Natural Language text and emerging approaches to topics like word representation.
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
Brief Introduction to Generative AI and LLM in particular.
Overview of the market, and usages of LLMs.
What's it like to train and build a model.
Retrieval Augmented Generation 101, explained for non savvies, and a perspective of what are the moving parts making it complex
Webinar: Question Answering and Virtual Assistants with Deep LearningLucidworks
In this webinar, Lucidworks Data Scientists Sanket Shahane and Sava Kalbachou will look at how Deep Learning can be used to create Question Answering and Virtual Assistant type systems and the accuracy and performance of different approaches. We’ll even demo an insurance-industry question answering system scenario.
FriendsQA: Open-domain Question Answering on TV Show TranscriptsJinho Choi
This thesis presents FriendsQA, a challenging question answering dataset that contains 1,222 dialogues and 10,610 open-domain questions, to tackle machine comprehension on everyday conversations. Each dialogue, involving multiple speakers, is annotated with six types of questions what, when, why, where, who, how regarding the dialogue contexts, and the answers are annotated with contiguous spans in the dialogue. A series of crowdsourcing tasks are conducted to ensure good annotation quality, resulting a high inter-annotator agreement of 81.82%. A comprehensive annotation analytics is provided for a deeper understanding in this dataset. Three state-of-the-art QA systems are experimented, R-Net, QANet, and BERT, and evaluated on this dataset. BERT in particular depicts promising results, an accuracy of 74.2% for answer utterance selection and an F1-score of 64.2% for answer span selection, suggesting that the FriendsQA task is hard yet has a great potential of elevating QA research on multiparty dialogue to another level.
Presentation of my work "Using Linked Data Traversal to Label Academic Communities" at the SAVE-SD workshop, co-located with the 24th International World Wide Web Conference at Florence, Italy
Personalized Learning: Expanding the Social Impact of AIPeter Brusilovsky
Slide of my keynote talk at SIAIA '23 workshop held at AAAI 2023:
The use of AI in Education could be traced to the early days of AI. While the publicity associated with the most recent wave of AI applications rarely mentions education, it is through the improvement in education AI could achieve an impressive social impact. In particular, the AI ability to personalize the learning process could make a large difference in a context where learners' knowledge could be radically different from learner to learner. Modern computer and internet technologies can now bring the power of learning in the forms of MOOCs, online textbooks, and zoom courses truly worldwide. Yet, without personalization, the potential of these technologies is not fully leveraged. In this talk, I will review several generations of research on personalized learning and discuss tools, technologies, and infrastructures for personalized learning that we are currently exploring.
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
Similar to Intro to Deep Learning for Question Answering (20)
Wholi: The right people find each other (at the right time)
Two key elements in this talk:
•PART 1: Machine learning for entity extraction
Natural language processing (NLP), information extraction
•PART 2: Matching profiles using deep learning classifier
Deep learning, word embeddings
Deep neural networks for matching online social networking profilesTraian Rebedea
> Proposed a large dataset for matching online social networking profiles
›This allowed us to train a deep neural network for profile matching using both domain-specific features and word embeddings generated from textual descriptions from social profiles
›Experiments showed that the NN surpassed both unsupervised and supervised models, achieving a high precision (P = 0.95) with a good recall rate (R = 0.85)
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
Many historic periods (or events) are remembered
by slogans, expressions or words that are strongly linked to them. Educated people are also able to determine whether a particular word or expression is related to a specific period in human history. The present paper aims to establish correlations between significant historic periods (or events) and the texts written in that period. In order to achieve this, we have developed a system that automatically links words (and topics discovered using Latent Dirichlet Allocation) to periods of time in the recent history. For this analysis to be relevant and conclusive, it must be undertaken on a representative set of texts written throughout history. To this end, instead of relying on manually selected texts, the Google Books Ngram corpus has been chosen as a basis for the analysis. Although it provides only word n-gram statistics for the texts written in a given year, the resulting time series can be used to provide insights about the most important periods and events in recent history, by automatically linking them with specific keywords or even LDA topics.
Practical Machine Learning - Part 1 contains:
- Basic notations of ML (what tasks are there, what is a model, how to measure performance)
- A couple of examples of problems and solutions (taken from previous work)
- A brief presentation of open-source software used for ML (R, scikit-learn, Weka)
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
Intro to Deep Learning for Question Answering
1. Intro to Deep Learning for
Question Answering
Traian Rebedea, Ph.D.
Department of Computer Science, UPB
traian.rebedea@cs.pub.ro / trebedea@gmail.com
2. About Me
• The Academic Part:
• Education
• B.Sc., “Politehnica” University of Bucharest, CS Dept., Romania
• M.Sc., “Politehnica” University of Bucharest, CS Dept., Romania
• Ph.D., Natural Language Processing & Technology-Enhanced Learning,
“Politehnica” University of Bucharest, CS Dept., Romania
• Over 25 articles published at world-wide top conferences:
• http://www.informatik.uni-trier.de/~ley/db/indices/a-
tree/r/Rebedea:Traian.html
• 4 book chapters on NLP & Technology Enhanced Learning
• Jobs:
• Lecturer, “Politehnica” University of Bucharest, CS Dept., Romania
• Teaching Assistant, “Politehnica” University of Bucharest, CS Dept.,
Romania
Intro to Deep Learning for Question Answering 230 January 2017
3. About Me
• The Industrial Part
• Jobs
• PeopleGraph, Bucharest, Romania – Researcher, Natural Language Processing, Machine
Learning & Information Retrieval
• TeamNet, Bucharest, Romania – Research Consultant, Opinion Mining & Natural
Language Processing
• Create IT, Bucharest, Romania – Founder & Web Developer
• ProSoft Solutions, Bucharest, Romania – Java Developer
• Various collaborations with other companies: Bitdefender, Adobe, Treeworks,
UberVU
• Other
• Tutor for the Erasmus-Mundus DMKM Information Retrieval course (taught by
Ricard Gavalda from UPC)
3Intro to Deep Learning for Question Answering30 January 2017
4. Overview
• Why question answering (QA)?
• Previous work in QA (before deep learning)
• Deep learning for QA (intro)
• Simple CNN
• Dependency tree – RNN
• LSTM-based solution
Intro to Deep Learning for Question Answering 430 January 2017
5. Why Question Answering?
• QA systems have been around for quite some time
• In the 60s-80s, mostly domain-dependent QA
• Quite related to conversational agents, at least at the beginning
• Open domain QA systems received larger attention in the 90s
• Combination of NLP and IR/IE techniques
• One of the most famous: MIT START system (http://start.csail.mit.edu/index.php)
• Wolfram Alpha (https://www.wolframalpha.com/)
• Advanced systems use a combination of “shallow” methods together with
knowledge bases and more complex NLP methods
Intro to Deep Learning for Question Answering 530 January 2017
6. Why Question Answering?
• In the last 20 years, TREC and ACL provided workshops and tracks for
various flavor of QA tasks (closed and open-domain)
• Lately, a large number of new datasets and tasks have become available
which have improved the performance of (open-domain) QA systems
• QALD: Question-Answering for Linked Data
(http://qald.sebastianwalter.org/)
• Given a knowledge base and a question in natural language, extract the correct
answers from the knowledge base
• Small corpus: each year ~ 100 Q-A pairs for training and 100 for evaluation, 6 years
=> ~ 600 Q-A pairs for training and 600 for evaluation
• Allen AI Question Answering (http://allenai.org/data.html)
• (Open-domain) QA task which contains questions asked to primary/secondary
students in different topics (science, maths, etc.)
• Several datasets ~ 400-1000 Q-A pairs
Intro to Deep Learning for Question Answering 630 January 2017
7. Why Question Answering?
• SQuAD - Stanford QA Dataset (https://rajpurkar.github.io/SQuAD-explorer/)
• Open-domain answer sentence selection
• 100,000+ Q-A pairs on 500+ articles
• VisualQA (http://www.visualqa.org/)
• Given an image and a question in natural language, provide the correct answer (open-
domain)
• 600,000+ questions on more than 200,000 images
• MovieQA (http://movieqa.cs.toronto.edu/home/)
• Given a movie and a question in natural language, provide the correct answer (open-domain)
• almost 15,000 multiple choice question answers obtained from over 400 movies
• Several others
• Right now we are building a dataset similar to QALD, however it is aimed at
answering questions from databases
Intro to Deep Learning for Question Answering 730 January 2017
8. Previous work in Question Answering
• Before deep-learning / non deep-learning
• Use NLP techniques to find best match between question and candidate
answers: feature engineering or use expensive semantic resources
• Lexical / IR: similarity measures (cosine + tf/idf, stemming, lemmatization, BM25,
other retrieval models)
• Semantic: use non-neural word emdeddings (e.g. Latent Semantic Analysis - LSA), use
additional resources (linguistic ontologies – WordNet, other databases, thesauri,
ontologies – Freebase, DBpedia)
• Syntactic: compute constituency/dependency trees of question and answer – try to
align/match the two trees
• Mixed/other: string kernels, tree kernels, aligh/match based both on syntax and
semantics, classifiers using a mix of several features discussed until now
Intro to Deep Learning for Question Answering 830 January 2017
9. Discussed Question Answering Tasks
• Answer sentence selection
• Given a question
• Several possible sentences that also contain the answer (and anything else)
• Find the ones containing the answer
• Usually the sentences (answers) are longer than the questions
Q: When did Amtrak begin operations?
A: Amtrak has not turned a profit since it was founded in 1971.
• Factoid question answering (“quiz-bowl”)
• Given a longer description of the factoid answer
(usually an entity, event, etc.) - question
• Find the entity as “fast” as possible - answer
(using as few information/sentences/words
as possible from the description)
• Question is longer than answer
Q: A:
Holy Roman Empire
Intro to Deep Learning for Question Answering 930 January 2017
10. Deep learning for QA (intro)
• Simple CNN
• Yu, Lei, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. "Deep learning for
answer sentence selection." arXiv preprint arXiv:1412.1632 (2014) (Oxford U. &
Google DeepMind)
• Extension (good study): Feng, Minwei, Bing Xiang, Michael R. Glass, Lidan Wang, and
Bowen Zhou. "Applying deep learning to answer selection: A study and an open
task." In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE
Workshop on, pp. 813-820. IEEE, 2015 (IBM Watson)
• Dependency tree – RNN
• Iyyer, Mohit, Jordan L. Boyd-Graber, Leonardo Max Batista Claudino, Richard Socher,
and Hal Daumé III. "A Neural Network for Factoid Question Answering over
Paragraphs." In EMNLP, pp. 633-644. 2014 (Maryland & Colorado & Stanford U.)
• LSTM-based solution
• Tan, Ming, Bing Xiang, and Bowen Zhou. "LSTM-based Deep Learning Models for
non-factoid answer selection." arXiv preprint arXiv:1511.04108 (2015) (IBM Watson)
Intro to Deep Learning for Question Answering 1030 January 2017
11. Simple CNN for Answer Sentence Selection
• (qi, aij, yij)
• Binary classification problem
• qi – question
• aij – candidate sentences
• yij = 1 if aij contains the answer to question qi
0 o.w.
• Assumption: correct answers have high semantic similarity to
questions
• No (actually few) hand-crafted features
• Focus on modeling questions and answers as vectors, and evaluate
the relatedness of each QA pair in a shared vector space
Intro to Deep Learning for Question Answering 1130 January 2017
12. Simple CNN for Answer Sentence Selection
• Given the QA pair is modelled in the same d-dimensional vector
space, the probability of the answer being correct is:
• Intuition: transform the answer into the question-space q’ = M a,
then use dot-product to assess the similarity between q and q’
• Finally, the sigmoid function transforms the generated scores (dot-
products are not normalized!) to a probability (number between 0..1)
• Training by minimizing cross-entropy on training/labelled set
Intro to Deep Learning for Question Answering 1230 January 2017
13. Simple CNN for Answer Sentence Selection
• Bag-of-words model (simplest model, just embedddings, no NN here)
• Uses word embeddings for the vector space
• Then averages over all the words in the text (question or answer)
Intro to Deep Learning for Question Answering 1330 January 2017
14. Simple CNN for Answer Sentence Selection
• Bigram model
• Uses a simple CNN (Convolutional NN)
• Sensitive to word order
• Can capture information from n-grams
• Authors only use bigrams (adjacent words), but can be extended
• Use a single convolutional layer + average (sum) pooling layer
• Convolution vector (filter) is shared by all bigrams
Intro to Deep Learning for Question Answering 1430 January 2017
15. Simple CNN for Answer Sentence Selection
• Convolutional filter combines adjacent words (bigrams)
• Then average pooling combines all bigram features
• In practice, we just need to learn how
to combine the embedding of the words
in the bigram
Intro to Deep Learning for Question Answering 1530 January 2017
16. Simple CNN for Answer Sentence Selection
• Experiments on Text Retrieval Conference (TREC) QA track (8-13)
datasets, with candidate answers automatically selected from each
question’s document pool
• Task: rank candidate answers given question (IR specific task)
• Assess using Mean Average Precision (MAP) and Mean Reciprocal
Rank (MRR)
Intro to Deep Learning for Question Answering 1630 January 2017
17. What is MAP & MRR?
• IR metrics, more details here: https://web.stanford.edu/class/cs276/handouts/EvaluationNew-
handout-6-per.pdf OR any IR Book
Intro to Deep Learning for Question Answering 1730 January 2017
18. Simple CNN for Answer Sentence Selection
• Experimental results
• Used precomputed word embeddings (d=50) – details in paper, embeddings available online
• Embeddings could be improved for this task, but dataset is small
• Other weights randomly intitialised using a Gaussian distribution
• All hyperparameters were optimised via grid search
• AdaGrad for training
• And also added some hand-crafted features (there is a justification in paper, not very convincing):
• word co-occurrence count between Q & A
• word co-occurrence count weighted by IDF between Q & A
• Together with the QA matching probability as provided by the distributional model (CNN) used to
train a logistic regression classifier
Intro to Deep Learning for Question Answering 1830 January 2017
19. Simple CNN for Answer Sentence Selection
• Results were encouraging
• Co-occurance features are important
• Distributional model can assess semantics
Intro to Deep Learning for Question Answering 1930 January 2017
20. Dependency Tree – RNN
• Solution proposed for factoid “bowl quiz” QA
• Use a dependency tree recursive neural network (DT-RNN)
• Extend it to combine predictions across sentences to produce a
question answering neural network with trans-sentential averaging
(called QANTA)
Intro to Deep Learning for Question Answering 2030 January 2017
21. Dependency Tree – RNN
• Dependency trees are used to model syntax in NLP
• Two main types of (syntactic) parse trees: constituency and
dependency
• Dependencies are actually directed edges between words
Intro to Deep Learning for Question Answering 2130 January 2017
22. Dependency Tree – RNN
• DT-RNN is just briefly explained in the paper
• More details are available in another paper:
http://nlp.stanford.edu/~socherr/SocherKarpathyLeManningNg_TACL2013.pdf
• Key elements: original word embeddings, hidden representation for words (of the
same size as the original embeddings), one transformation for each dependency
type in the hidden space
For leaf nodes
For inner nodes
Intro to Deep Learning for Question Answering 2230 January 2017
23. Dependency Tree – RNN
• Example
Simpler formula
for inner nodes
Intro to Deep Learning for Question Answering 2330 January 2017
24. Dependency Tree – RNN
• Training: limit the number of possible answers => problem viewed as a multi-class
classication task
• Softmax can be used for the decision in the final layer by using features from question
and answer
• Improvement: word vectors associated with answers to be trained in the same vector
space as the question text
• Train both the answers and questions jointly in a single model
• Encourage vectors of question sentences to be near their correct answers and far away
from incorrect answers
• => Can use hinge loss
• => “While we are not interested in obtaining a ranked list of answers, we observe better
performance by adding the weighted approximaterank pairwise (WARP) loss”
Intro to Deep Learning for Question Answering 2430 January 2017
25. Dependency Tree – RNN
• Correct answer c
• Sample randomly j incorrect answers from the set of all incorrect answers
and denote this subset as Z
• S – set of all nods in a dependency tree
• Cost / lost function is WARP – a variation of hinge loss
• More details how to approximate L(rank(c, s, Z)) in section 3.2
• Training using backpropagation through structure
Intro to Deep Learning for Question Answering 2530 January 2017
26. Dependency Tree – RNN
• QANTA: Previous model + average the representations of each sentence seen so far in a
particular question
• This was the best aggregation found by the authors
• Datasets:
• History questions: training set of 3,761 questions with 14,217 sentences and a test set of 699
questions with 2,768 sentences
• Literature questions: training set of 4,777 questions with 17,972 sentences and a test set of 908
questions with 3,577 sentences
• 451 history answers and 595 literature answers that occur on average twelve times in the corpus
• Word embeddings (We): word2vec trained on the preprocessed question text in our
training set, then optimized in the current model
• Embedding size: 100, num incorrect sampled answers: 100
Intro to Deep Learning for Question Answering 2630 January 2017
27. Dependency Tree – RNN
• Results on test sets
• Several baselines, including comparison with all the text in Wikipedia page for the
answer
• Also comparison with human players, after the first sentence in the question
Intro to Deep Learning for Question Answering 2730 January 2017
28. Dependency Tree – RNN
Intro to Deep Learning for Question Answering 2830 January 2017
29. LSTM Solution for Question Answering
• Work on sentence answer selection
• Use a sequence NN model to model the representation of Q&A
• LSTM is the obvious choice
Intro to Deep Learning for Question Answering 2930 January 2017
30. LSTM Solution for Question Answering
• Use a bidirectional LSTM (BiLSTM)
• Both the previous and future context by processing the sequence on two
directions
• Generate two independent sequences of LSTM output vectors
• One processes the input sequence forward, and one backward
• The input sequence contains the word embeddings for the analyzed
text (Q&A)
• Output at each step contains the concatenation of the output vectors
for both directions
Intro to Deep Learning for Question Answering 3030 January 2017
31. LSTM Solution for Question Answering
• Basic QA-LSTM model
• Compute BiLSTM representation for Q&A, then use a pooling method and cosine similarity for
comparison
• Dropout on the last layer, before cosine
• Hinge loss for training
Intro to Deep Learning for Question Answering 3130 January 2017
32. LSTM Solution for Question Answering
• Best model when Q & A sides share the same network parameters
• Significantly better than the one that the question and answer sides
own their own parameters
• Converges much faster
Intro to Deep Learning for Question Answering 3230 January 2017
33. LSTM Solution for Question Answering
• First improvement: QA-LSTM/CNN
• Put a CNN on top of the outputs of the BiLSTM
• Filter size m, output of the CNN for one filter is:
Intro to Deep Learning for Question Answering 3330 January 2017
34. LSTM Solution for Question Answering
• “The intuition of this structure is, instead of evenly considering the
lexical information of each token as the previous subsection, we
emphasize on certain parts of the answer, such that QA-LSTM/CNN
can more effectively differentiate the ground truths and incorrect
answers.”
Intro to Deep Learning for Question Answering 3430 January 2017
35. LSTM Solution for Question Answering
• Second improvement: Attention-based QA-LSTM
• “The fixed width of hidden vectors becomes a bottleneck, when the
bidirectional LSTM models must propagate dependencies over long
distances over the questions and answers.
• An attention mechanism is used to alleviate this weakness by dynamically
aligning the more informative parts of answers to the questions.”
• Simple attention mechanism over the basic QA-LSTM model
• Prior to pooling, each biLSTM output vector for the answer will be
multiplied by a softmax weight, which is determined by the question
embedding from biLSTM
Intro to Deep Learning for Question Answering 3530 January 2017
36. LSTM Solution for Question Answering
• Conceptually, the attention mechanism gives more weight on certain
words, just like tf-idf for each word
• But it computes the weights according to question information
Intro to Deep Learning for Question Answering 3630 January 2017
37. LSTM Solution for Question Answering
• Experiment 1: InsuranceQA
• Grid search for hyper-parameter tuning
• Word embedding is initialized using word2vec, size 100. They are
further optimized as well during the training
• LSTM output vectors is 141 for one direction
• Also tried various norms
• SGD training
Intro to Deep Learning for Question Answering 3730 January 2017
38. LSTM Solution for Question Answering
• QA-LSTM compared against several baselines
• Metric is accuracy
Intro to Deep Learning for Question Answering 3830 January 2017
39. LSTM Solution for Question Answering
• Models’ performance by ground answer length
Intro to Deep Learning for Question Answering 3930 January 2017
40. LSTM Solution for Question Answering
• TREC-QA results
Intro to Deep Learning for Question Answering 4030 January 2017
41. CNN for QA – extended study
• Feng, Minwei, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. "Applying deep
learning to answer selection: A study and an open task." In Automatic Speech Recognition and
Understanding (ASRU), 2015 IEEE Workshop on, pp. 813-820. IEEE, 2015 – online here:
https://arxiv.org/pdf/1508.01585.pdf
• Proposes several CNN architectures for QA
Intro to Deep Learning for Question Answering 4130 January 2017
42. • [1] Yu, Lei, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. "Deep learning for
answer sentence selection." arXiv preprint arXiv:1412.1632 (2014).- online
here: https://arxiv.org/pdf/1412.1632.pdf
• [2] - Iyyer, Mohit, Jordan L. Boyd-Graber, Leonardo Max Batista Claudino, Richard Socher,
and Hal Daumé III. "A Neural Network for Factoid Question Answering over Paragraphs."
In EMNLP, pp. 633-644. 2014 - online
here: https://cs.umd.edu/~miyyer/pubs/2014_qb_rnn.pdf
• [3] - Tan, Ming, Bing Xiang, and Bowen Zhou. "LSTM-based Deep Learning Models for
non-factoid answer selection." arXiv preprint arXiv:1511.04108 (2015) - online
here: https://arxiv.org/pdf/1511.04108v4.pdf
• [4] - Feng, Minwei, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. "Applying
deep learning to answer selection: A study and an open task." In Automatic Speech
Recognition and Understanding (ASRU), 2015 IEEE Workshop on, pp. 813-820. IEEE, 2015
– online here: https://arxiv.org/pdf/1508.01585.pdf
References
Intro to Deep Learning for Question Answering 4230 January 2017