Scene recognition using Convolutional Neural Network

Scene
Recognition
Using Convolutional Neural Network

 Dhiraj Gidde
 Vinayak Kamat
 Rohan Upadhye
 Vivek Kumbhar
 Prasad Badave
Group Members
2

TABLE OF
CONTENTS
3
01 Abstract
02 Problem Statement
03 Introduction
04 Goals
05 Literature
06 Methodology
07 Expected Inputs
08 Expected Outputs
09 Related Techniques
10 References

4
Abstract
Scene recognition is one of the hallmark tasks of
computer vision, allowing definition of a context for object
recognition. Whereas the tremendous recent progress in object
recognition tasks is due to the availability of large datasets like
ImageNet and the rise of Convolutional Neural Networks (CNNs)
for learning high-level features, performance at scene
recognition has not attained the same level of success.
This may be because current deep features trained from
ImageNet are not competitive enough for such tasks. Here, we
use a new scene-centric database called Places with over 7
million labeled pictures of scenes. We propose methods to
compare the density and diversity of image datasets and show
that Places is as dense as other scene datasets and has more
diversity. Using CNN, we learn deep features for scene
recognition tasks, and establish new state-of-the-art results on
several scene-centric datasets. A visualization of the CNN layers’
responses allows us to show differences in the internal
representations of object-centric and scene-centric networks

5
Problem
Statement
To recognize the scene in
captured image by using
Convolutional neural
network.

6
Introduction
Understanding the world in a single glance is one of the
most accomplished feats of the human brain. It takes
only a few tens of milliseconds to recognize the
category of an object or environment, emphasizing
an important role of feed forward processing in visual
recognition.
Here we uses PLACES or SUN dataset for recognizing
the scenes from the given inputs.
At the end of the day our system will show the scene
captured and detect it with the situation.

Goals
Classifying the scene of the entire image using CNN.
7

8
Literature
In [1], They measured relative densities and diversities between SUN, IMAGENET and PLACES using
AMT(Automated Mechanical Transmission).They intoduced PLACES as a new dataset containing 7 million
images from 476 places.
In [2], They implemented dataset bias of IMAGENET and PLACES to increases the accuracy upto 70%.
In [3], They stated that, Convolutional neural network helps us to simulate human vision which is amazing
at scene recognition.
In [4], They improved the PLACES dataset with adding extra 3 million images, containing 900 different
Categeries.

Fig: System Architecture for Scene recognition

11
Methodology
Module 1: Scaled Versions:
The captured images is given as the input.
Module 2: Input crops:
The image is biased into the object-centric(IMAGENET) and Scene-Centric(PLACES)
images
Module 3:Convolutional neural network(CNN)
The biased images are classified with the help of the Convolutional neural network
CNN Steps:
1.Input layer
2.Convolutional Layer.
3.Normalisation.
4.Max pooling
5.Output layer

12
Methodology
Module 4: Intra-scale feature
The output given by the max pooling(CNN) is considered as the intra-scale output.
Module 5: Multi-scale feature
Its combines the all intra-scale feature and predicted the accurate scene .

13
Expected Inputs
Scenes Indoor Scene Outdoor Scene

14
Auditorium Hall Railroad Track
Expected Outputs

02
04
01
03
15
Sun Dataset
The database contains 397 categories. The number of
images varies across categories, but there are at least 100
images per category, and 108,754 images in total.
Deep learning
libraries
TensorFlow is an open-source software library for
dataflow programming across a range of tasks. It is a
symbolic math library, and is also used for machine
learning applications such as neural networks.
Places Dataset
The places dataset, a repository of 10 million Scene
photograph, labeled with scene semantic Categories and
attributes, comprising a quasi-exhaustive list of the type of
the types environments encountered in the world
Convolutional neural
network
Convolutional networks were inspired
by biological processes in that the connectivity pattern
between neurons resembles the organization of the
animal visual cortex.
Related Techniques/Tools

INTRODUCTION
Scene Recognition is helpful for
driver less car in which the car will
be able to detect the scene and it
can understand the scenario (e.g.-
The car will be able to understand
that there is crowd on the road,
pedestrian). Scenes can be
classified into various categories
such as indoor scenes, outdoor
scenes etc.
Purpose
Understanding the world in a
single glance is one of the most
accomplished feats of the
human brain. It takes only a
few tens of milliseconds to
recognize the category of an
object or environment,
emphasizing an important role
of feed forward processing in
visual recognition.
Scope of project
16

Usecase Description
17
Usecase Description
Captured Image It’s the image captured through mobile
or camera.
Crop the image with object centric
and scene centric
Captured image is cropped into object
centric and scene centric
Scene classification through CNN CNN are trained with object centric and
scene centric dataset.
Predicted scene The scene predicted by CNN
Category of scene Category predicted by CNN
Attribute of scene Attribute predicted by CNN

Requirement Analysis
18
•Functional Requirement Specifications
•System Requirement Specifications

Functional Requirement Specifications
19
1. External interface requirement
1. GPU MACHINE
System Requirements
1. Hardware requirement
1. Hard-disk 500GB
2. RAM 8 GB

Non-functional Requirement
Specifications
20
•Portability
The degree to which
software running on one
platform can easily be
converted to run on
another platform.
Can be enhanced by
using languages, OS’s and
tools that are universally
available and standardized.
Reliability
The ability of the
system to behave
consistently in a user-
acceptable manner
when operating within
the environment for
which the system was
intended.
Theory and practice
of hardware reliability
are well established,
some try to adopt them
for software.
•Performance
High specifications leads
to high performance.
As we are using 64-bit
operating system, this
produces the more accurate
output, thus leads to
portable in nature.

MODULE IDENTIFICATION
22
Module 1: Scaled Versions:
The captured images is given as the input.
Module 2: Input crops:
The image is biased into the object-centric(IMAGENET) and Scene-Centric(PLACES)
images
Module 3:Convolutional neural network(CNN)
The biased images are classified with the help of the Convolutional neural network
Module 4: Intra-scale feature
The output given by the max pooling(CNN) is considered as the intra-scale output.
Module 5: Multi-scale feature
Its combines the all intra-scale feature and predicted the accurate scene .

23
ALGORITHM DESIGN
CNN Steps :
1.Input layer 4.Max pooling
2.Convolutional Layer. 5.Output layer
3.Normalisation.

24
DESIGN DOCUMENTS
USE-CASE Diagram

26
References
Research Papers
[1] Bolei Zhou1, Agata Lapedriza1,3, Jianxiong Xiao2, Antonio Torralba1, and Aude
Oliva1,“Learning Deep Features for Scene Recognition using Places Database”.
Massachusetts Institute of Technology, Princeton University. (2015)
[2] Luis Herranz, Shuqiang Jiang, Xiangyang Li,“Scene recognition with CNNs: objects,
scales and dataset bias”. IEEE Conference on Computer Vision and Pattern Recognition. (2016)
[3] Bavin Ondieki,“Convolutional Neural Networks for Scene Recognition” Stanford
University.(2016)
[4] Bolei Zhou1, Agata Latdriza, Adtiya Khosala, “Places: A 10 million image database for
scene recognition” IEEE Transactions on Pattern Analysis and Machine Intelligence.(2017)

ANY QUESTIONS?
THANK YOU!
THE FUTURE STARTS
TODAY, NOT TOMORROW.

Scene recognition using Convolutional Neural Network

More Related Content

What's hot

Similar to Scene recognition using Convolutional Neural Network

Recently uploaded

Scene recognition using Convolutional Neural Network