2
Visual Question-Answering
▪ Team Members:
▪ Abdalla Shaaban Elsayed
▪ Rabah Jamal Mohammed Ali
▪ Abdullah Abdelkader Roshdy
▪ Abdullah Mahmoud Abdullah
▪ Supervisor:
▪ Dr Sally Saad
▪ TA Ahmed Salah
▪ Introduction
▪ Motivation
▪ Problem definition
▪ Objective
▪ Working Phases
▪ Time Plan
▪ Tools
2
Outline
▪ Introduction
3
Outline
4
Predict t he A nsw er of a given quest ion relat ed t o an image .
Visual Question-Answering
▪ Introduction
▪ Motivation
5
Outline
6
▪ Performing complex activities.
▪ Merging between two or more sub-problems.
▪ Understanding :
- Convolutional neural network
- Natural language processing
- Recurrent neural network
▪ Obtaining high accuracy from complex model.
Motivation
Types of Visual Question-Answering
7
▪ Introduction
▪ Motivation
▪ Problem definition
8
Outline
9
How to build a model that extract feature of an image related
To a given question ?
Problem definition
▪ Introduction
▪ Motivation
▪ Problem definition
▪ Objective
10
Outline
Objectives
▪ Build a visual question answering system using hierarchical co-Attention
technique.
▪ We aim to slightly improve the result by taking a question and an image as input
and outputs a response to the answer based on how the RCNN understands the
question asked.
11
▪ Introduction
▪ Motivation
▪ Problem definition
▪ Objective
▪ Working Phases
12
Outline
Data
Preprocessing
Model
Building
Model
Testing and
Validation
Model
Interface
13
Phases Diagram
Phases overview | Data preprocessing
Gathering
Datasets
• VQA Dataset
• COCO-QA Dataset
Preparing
Dataset
• Cleaning the dataset using NLTK.
• Text representation using word embedding.
14
Phases overview | Model Building
Image Feature
Extraction
Question
Hierarchy
Co-Attention
Encoding for
Predicting
Answers
15
Phases overview | Model Building
▪ The model will extract the word level, phrase level, and question level embedding.
At each level, it applies co-attention on both the image and question. The final
answer prediction is based on all the co-attended image and question features.
16
Phases overview | Model Testing
▪ Measuring the system’s accuracy and the level of correctness of the predicted answers.
17
Phases overview | Interface
▪ Build a user interface for the system, which allows the user to interact with the system.
▪ Using Python’s framework with CSS and Javascript (optional).
18
▪ Introduction
▪ Motivation
▪ Problem definition
▪ Objective
▪ Working Phases
▪ Time Plan
19
Outline
Time Plan
20
▪ Introduction
▪ Motivation
▪ Problem definition
▪ Objective
▪ Working Phases
▪ Time Plan
▪ Tools
21
Outline
22
Tools
▪ Languages:
Python for preprocessing the datasets.
Javascript for the UI (optional).
▪ Libraries and Frameworks:
NLTK, Pillow (Python Imaging Library) for preprocessing the dataset.
TensorFlow to build the model.
Questions
23
References
▪ Chenyue Meng and Yixin Wang, “Image-Question-Linguistic Co-Attention for
Visual Question Answering”, 2016.
▪ Alisha Rege and Payal Bajaj C, “From Vision to NLP: A Merge”, 2017.
▪ Ronghang Hu and Jacob Andreas and Marcus Rohrbach, “Learning to Reason:
End-to-End Module Networks for Visual Question Answering” , 2017.
▪ Jiasen Luand Jianwei Yang and Dhruv Batra , “Hierarchical Question-Image Co-
Attention for Visual Question Answering” , 2017
24
25
Thank You!

Vqa seminar (1)

  • 1.
    2 Visual Question-Answering ▪ TeamMembers: ▪ Abdalla Shaaban Elsayed ▪ Rabah Jamal Mohammed Ali ▪ Abdullah Abdelkader Roshdy ▪ Abdullah Mahmoud Abdullah ▪ Supervisor: ▪ Dr Sally Saad ▪ TA Ahmed Salah
  • 2.
    ▪ Introduction ▪ Motivation ▪Problem definition ▪ Objective ▪ Working Phases ▪ Time Plan ▪ Tools 2 Outline
  • 3.
  • 4.
    4 Predict t heA nsw er of a given quest ion relat ed t o an image . Visual Question-Answering
  • 5.
  • 6.
    6 ▪ Performing complexactivities. ▪ Merging between two or more sub-problems. ▪ Understanding : - Convolutional neural network - Natural language processing - Recurrent neural network ▪ Obtaining high accuracy from complex model. Motivation
  • 7.
    Types of VisualQuestion-Answering 7
  • 8.
    ▪ Introduction ▪ Motivation ▪Problem definition 8 Outline
  • 9.
    9 How to builda model that extract feature of an image related To a given question ? Problem definition
  • 10.
    ▪ Introduction ▪ Motivation ▪Problem definition ▪ Objective 10 Outline
  • 11.
    Objectives ▪ Build avisual question answering system using hierarchical co-Attention technique. ▪ We aim to slightly improve the result by taking a question and an image as input and outputs a response to the answer based on how the RCNN understands the question asked. 11
  • 12.
    ▪ Introduction ▪ Motivation ▪Problem definition ▪ Objective ▪ Working Phases 12 Outline
  • 13.
  • 14.
    Phases overview |Data preprocessing Gathering Datasets • VQA Dataset • COCO-QA Dataset Preparing Dataset • Cleaning the dataset using NLTK. • Text representation using word embedding. 14
  • 15.
    Phases overview |Model Building Image Feature Extraction Question Hierarchy Co-Attention Encoding for Predicting Answers 15
  • 16.
    Phases overview |Model Building ▪ The model will extract the word level, phrase level, and question level embedding. At each level, it applies co-attention on both the image and question. The final answer prediction is based on all the co-attended image and question features. 16
  • 17.
    Phases overview |Model Testing ▪ Measuring the system’s accuracy and the level of correctness of the predicted answers. 17
  • 18.
    Phases overview |Interface ▪ Build a user interface for the system, which allows the user to interact with the system. ▪ Using Python’s framework with CSS and Javascript (optional). 18
  • 19.
    ▪ Introduction ▪ Motivation ▪Problem definition ▪ Objective ▪ Working Phases ▪ Time Plan 19 Outline
  • 20.
  • 21.
    ▪ Introduction ▪ Motivation ▪Problem definition ▪ Objective ▪ Working Phases ▪ Time Plan ▪ Tools 21 Outline
  • 22.
    22 Tools ▪ Languages: Python forpreprocessing the datasets. Javascript for the UI (optional). ▪ Libraries and Frameworks: NLTK, Pillow (Python Imaging Library) for preprocessing the dataset. TensorFlow to build the model.
  • 23.
  • 24.
    References ▪ Chenyue Mengand Yixin Wang, “Image-Question-Linguistic Co-Attention for Visual Question Answering”, 2016. ▪ Alisha Rege and Payal Bajaj C, “From Vision to NLP: A Merge”, 2017. ▪ Ronghang Hu and Jacob Andreas and Marcus Rohrbach, “Learning to Reason: End-to-End Module Networks for Visual Question Answering” , 2017. ▪ Jiasen Luand Jianwei Yang and Dhruv Batra , “Hierarchical Question-Image Co- Attention for Visual Question Answering” , 2017 24
  • 25.