1. 2
Visual Question-Answering
▪ Team Members:
▪ Abdalla Shaaban Elsayed
▪ Rabah Jamal Mohammed Ali
▪ Abdullah Abdelkader Roshdy
▪ Abdullah Mahmoud Abdullah
▪ Supervisor:
▪ Dr Sally Saad
▪ TA Ahmed Salah
7. 4
Motivation
▪ performing complex activities .
▪ Merging between two or more sub-problems
▪ Understanding :
- computer vision
- natural language processing
- recurrent neural network
▪ Obtaining high accuracy from complex model
9. 3
▪ Introduction
▪ Motivation.
▪ Problem definition
▪ Objective
▪ Background and survey
▪ Proposed solution
▪ Tools
▪ Work plane
Outline
10. 3
Objective
▪ This project attempts to combine computer vision and natural language
processing to create a visual question answering system.
▪ We aim to slightly improve the result by taking a question and an image
as input and outputs a response to the answer based on how the RCNN
understands the question asked.
12. 3
Tools and technique
▪ Languages:
Python for preprocessing the datasets.
Javascript for the UI.
▪ Libraries and Frameworks:
NLTK, Pillow (Python Imaging Library) for preprocessing the dataset.
TensorFlow to build the model.
Python’s framework (Flask/Django) to build the the web application.
14. 3
Phases overview | Data preprocessing
▪ Gathering datasets:
VQA Dataset: The largest dataset for this problem, containing human
annotated questions and answers on Microsoft COCO dataset.
COCO-QA Dataset: Automatically generated from captions in the Microsoft
COCO dataset.
▪ Preparing dataset:
Cleaning the dataset using NLTK.
Text representation for the questions using word embedding.
15. 3
▪ Studying recurrent neural networks and convolutional neural
networks.
▪ Studying TensorFlow library.
▪ The model is based on:
▪ Recurrent Neural Network: which reads the input question as
tokens, and predicts the output answer.
▪ Convolutional Neural Network: which reads the input image and
gives the feature vector.
▪ Training the model on the dataset.
Phases overview | model building
16. 3
▪ Measuring the system’s accuracy and the level of correctness of
the predicted answers.
Phases overview | model testing
17. 3
Phases overview | Interface
• Build a web application for the system, which allows the user to
interact with the system.
• Using Python’s framework with CSS and Javascript.