Vqa seminar (1)

2
Visual Question-Answering
▪ Team Members:
▪ Abdalla Shaaban Elsayed
▪ Rabah Jamal Mohammed Ali
▪ Abdullah Abdelkader Roshdy
▪ Abdullah Mahmoud Abdullah
▪ Supervisor:
▪ Dr Sally Saad
▪ TA Ahmed Salah

▪ Introduction
▪ Motivation
▪ Problem definition
▪ Objective
▪ Working Phases
▪ Time Plan
▪ Tools
2
Outline

4
Predict t he A nsw er of a given quest ion relat ed t o an image .
Visual Question-Answering

▪ Introduction
▪ Motivation
5
Outline

6
▪ Performing complex activities.
▪ Merging between two or more sub-problems.
▪ Understanding :
- Convolutional neural network
- Natural language processing
- Recurrent neural network
▪ Obtaining high accuracy from complex model.
Motivation

Types of Visual Question-Answering
7

▪ Introduction
▪ Motivation
8
Outline

9
How to build a model that extract feature of an image related
To a given question ?
Problem definition

▪ Introduction
▪ Motivation
▪ Objective
10
Outline

Objectives
▪ Build a visual question answering system using hierarchical co-Attention
technique.
▪ We aim to slightly improve the result by taking a question and an image as input
and outputs a response to the answer based on how the RCNN understands the
question asked.
11

▪ Introduction
▪ Motivation
▪ Objective
▪ Working Phases
12
Outline

Data
Preprocessing
Model
Building
Model
Testing and
Validation
Model
Interface
13
Phases Diagram

Phases overview | Data preprocessing
Gathering
Datasets
• VQA Dataset
• COCO-QA Dataset
Preparing
Dataset
• Cleaning the dataset using NLTK.
• Text representation using word embedding.
14

Phases overview | Model Building
Image Feature
Extraction
Question
Hierarchy
Co-Attention
Encoding for
Predicting
Answers
15

Phases overview | Model Building
▪ The model will extract the word level, phrase level, and question level embedding.
At each level, it applies co-attention on both the image and question. The final
answer prediction is based on all the co-attended image and question features.
16

Phases overview | Model Testing
▪ Measuring the system’s accuracy and the level of correctness of the predicted answers.
17

Phases overview | Interface
▪ Build a user interface for the system, which allows the user to interact with the system.
▪ Using Python’s framework with CSS and Javascript (optional).
18

▪ Introduction
▪ Motivation
▪ Objective
▪ Working Phases
▪ Time Plan
19
Outline

▪ Introduction
▪ Motivation
▪ Objective
▪ Working Phases
▪ Time Plan
▪ Tools
21
Outline

22
Tools
▪ Languages:
Python for preprocessing the datasets.
Javascript for the UI (optional).
▪ Libraries and Frameworks:
NLTK, Pillow (Python Imaging Library) for preprocessing the dataset.
TensorFlow to build the model.

References
▪ Chenyue Meng and Yixin Wang, “Image-Question-Linguistic Co-Attention for
Visual Question Answering”, 2016.
▪ Alisha Rege and Payal Bajaj C, “From Vision to NLP: A Merge”, 2017.
▪ Ronghang Hu and Jacob Andreas and Marcus Rohrbach, “Learning to Reason:
End-to-End Module Networks for Visual Question Answering” , 2017.
▪ Jiasen Luand Jianwei Yang and Dhruv Batra , “Hierarchical Question-Image Co-
Attention for Visual Question Answering” , 2017
24

Vqa seminar (1)

Recommended

Recommended

More Related Content

Similar to Vqa seminar (1)

Similar to Vqa seminar (1) (20)

Recently uploaded

Recently uploaded (20)

Vqa seminar (1)