Scene
Recognition
Using Convolutional Neural Network
 Dhiraj Gidde
 Vinayak Kamat
 Rohan Upadhye
 Vivek Kumbhar
 Prasad Badave
Group Members
2
TABLE OF
CONTENTS
3
01 Abstract
02 Problem Statement
03 Introduction
04 Goals
05 Literature
06 Methodology
07 Expected Inputs
08 Expected Outputs
09 Related Techniques
10 References
4
Abstract
Scene recognition is one of the hallmark tasks of
computer vision, allowing definition of a context for object
recognition. Whereas the tremendous recent progress in object
recognition tasks is due to the availability of large datasets like
ImageNet and the rise of Convolutional Neural Networks (CNNs)
for learning high-level features, performance at scene
recognition has not attained the same level of success.
This may be because current deep features trained from
ImageNet are not competitive enough for such tasks. Here, we
use a new scene-centric database called Places with over 7
million labeled pictures of scenes. We propose methods to
compare the density and diversity of image datasets and show
that Places is as dense as other scene datasets and has more
diversity. Using CNN, we learn deep features for scene
recognition tasks, and establish new state-of-the-art results on
several scene-centric datasets. A visualization of the CNN layers’
responses allows us to show differences in the internal
representations of object-centric and scene-centric networks
5
Problem
Statement
To recognize the scene in
captured image by using
Convolutional neural
network.
6
Introduction
Understanding the world in a single glance is one of the
most accomplished feats of the human brain. It takes
only a few tens of milliseconds to recognize the
category of an object or environment, emphasizing
an important role of feed forward processing in visual
recognition.
Here we uses PLACES or SUN dataset for recognizing
the scenes from the given inputs.
At the end of the day our system will show the scene
captured and detect it with the situation.
Goals
Classifying the scene of the entire image using CNN.
7
8
Literature
In [1], They measured relative densities and diversities between SUN, IMAGENET and PLACES using
AMT(Automated Mechanical Transmission).They intoduced PLACES as a new dataset containing 7 million
images from 476 places.
In [2], They implemented dataset bias of IMAGENET and PLACES to increases the accuracy upto 70%.
In [3], They stated that, Convolutional neural network helps us to simulate human vision which is amazing
at scene recognition.
In [4], They improved the PLACES dataset with adding extra 3 million images, containing 900 different
Categeries.
9
KEYWORDS
Fig: System Architecture for Scene recognition
11
Methodology
Module 1: Scaled Versions:
The captured images is given as the input.
Module 2: Input crops:
The image is biased into the object-centric(IMAGENET) and Scene-Centric(PLACES)
images
Module 3:Convolutional neural network(CNN)
The biased images are classified with the help of the Convolutional neural network
CNN Steps:
1.Input layer
2.Convolutional Layer.
3.Normalisation.
4.Max pooling
5.Output layer
12
Methodology
Module 4: Intra-scale feature
The output given by the max pooling(CNN) is considered as the intra-scale output.
Module 5: Multi-scale feature
Its combines the all intra-scale feature and predicted the accurate scene .
13
Expected Inputs
Scenes Indoor Scene Outdoor Scene
14
Auditorium Hall Railroad Track
Expected Outputs
02
04
01
03
15
Sun Dataset
The database contains 397 categories. The number of
images varies across categories, but there are at least 100
images per category, and 108,754 images in total.
Deep learning
libraries
TensorFlow is an open-source software library for
dataflow programming across a range of tasks. It is a
symbolic math library, and is also used for machine
learning applications such as neural networks.
Places Dataset
The places dataset, a repository of 10 million Scene
photograph, labeled with scene semantic Categories and
attributes, comprising a quasi-exhaustive list of the type of
the types environments encountered in the world
Convolutional neural
network
Convolutional networks were inspired
by biological processes in that the connectivity pattern
between neurons resembles the organization of the
animal visual cortex.
Related Techniques/Tools
INTRODUCTION
Scene Recognition is helpful for
driver less car in which the car will
be able to detect the scene and it
can understand the scenario (e.g.-
The car will be able to understand
that there is crowd on the road,
pedestrian). Scenes can be
classified into various categories
such as indoor scenes, outdoor
scenes etc.
Purpose
Understanding the world in a
single glance is one of the most
accomplished feats of the
human brain. It takes only a
few tens of milliseconds to
recognize the category of an
object or environment,
emphasizing an important role
of feed forward processing in
visual recognition.
Scope of project
16
Usecase Description
17
Usecase Description
Captured Image It’s the image captured through mobile
or camera.
Crop the image with object centric
and scene centric
Captured image is cropped into object
centric and scene centric
Scene classification through CNN CNN are trained with object centric and
scene centric dataset.
Predicted scene The scene predicted by CNN
Category of scene Category predicted by CNN
Attribute of scene Attribute predicted by CNN
Requirement Analysis
18
•Functional Requirement Specifications
•System Requirement Specifications
Functional Requirement Specifications
19
1. External interface requirement
1. GPU MACHINE
System Requirements
1. Hardware requirement
1. Hard-disk 500GB
2. RAM 8 GB
Non-functional Requirement
Specifications
20
•Portability
The degree to which
software running on one
platform can easily be
converted to run on
another platform.
Can be enhanced by
using languages, OS’s and
tools that are universally
available and standardized.
Reliability
The ability of the
system to behave
consistently in a user-
acceptable manner
when operating within
the environment for
which the system was
intended.
Theory and practice
of hardware reliability
are well established,
some try to adopt them
for software.
•Performance
High specifications leads
to high performance.
As we are using 64-bit
operating system, this
produces the more accurate
output, thus leads to
portable in nature.
Architecture Diagram
21
MODULE IDENTIFICATION
22
Module 1: Scaled Versions:
The captured images is given as the input.
Module 2: Input crops:
The image is biased into the object-centric(IMAGENET) and Scene-Centric(PLACES)
images
Module 3:Convolutional neural network(CNN)
The biased images are classified with the help of the Convolutional neural network
Module 4: Intra-scale feature
The output given by the max pooling(CNN) is considered as the intra-scale output.
Module 5: Multi-scale feature
Its combines the all intra-scale feature and predicted the accurate scene .
23
ALGORITHM DESIGN
CNN Steps :
1.Input layer 4.Max pooling
2.Convolutional Layer. 5.Output layer
3.Normalisation.
24
DESIGN DOCUMENTS
USE-CASE Diagram
25
DESIGN DOCUMENTS
Flowchart
26
References
Research Papers
[1] Bolei Zhou1, Agata Lapedriza1,3, Jianxiong Xiao2, Antonio Torralba1, and Aude
Oliva1,“Learning Deep Features for Scene Recognition using Places Database”.
Massachusetts Institute of Technology, Princeton University. (2015)
[2] Luis Herranz, Shuqiang Jiang, Xiangyang Li,“Scene recognition with CNNs: objects,
scales and dataset bias”. IEEE Conference on Computer Vision and Pattern Recognition. (2016)
[3] Bavin Ondieki,“Convolutional Neural Networks for Scene Recognition” Stanford
University.(2016)
[4] Bolei Zhou1, Agata Latdriza, Adtiya Khosala, “Places: A 10 million image database for
scene recognition” IEEE Transactions on Pattern Analysis and Machine Intelligence.(2017)
ANY QUESTIONS?
THANK YOU!
THE FUTURE STARTS
TODAY, NOT TOMORROW.

Scene recognition using Convolutional Neural Network

  • 1.
  • 2.
     Dhiraj Gidde Vinayak Kamat  Rohan Upadhye  Vivek Kumbhar  Prasad Badave Group Members 2
  • 3.
    TABLE OF CONTENTS 3 01 Abstract 02Problem Statement 03 Introduction 04 Goals 05 Literature 06 Methodology 07 Expected Inputs 08 Expected Outputs 09 Related Techniques 10 References
  • 4.
    4 Abstract Scene recognition isone of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success. This may be because current deep features trained from ImageNet are not competitive enough for such tasks. Here, we use a new scene-centric database called Places with over 7 million labeled pictures of scenes. We propose methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity. Using CNN, we learn deep features for scene recognition tasks, and establish new state-of-the-art results on several scene-centric datasets. A visualization of the CNN layers’ responses allows us to show differences in the internal representations of object-centric and scene-centric networks
  • 5.
    5 Problem Statement To recognize thescene in captured image by using Convolutional neural network.
  • 6.
    6 Introduction Understanding the worldin a single glance is one of the most accomplished feats of the human brain. It takes only a few tens of milliseconds to recognize the category of an object or environment, emphasizing an important role of feed forward processing in visual recognition. Here we uses PLACES or SUN dataset for recognizing the scenes from the given inputs. At the end of the day our system will show the scene captured and detect it with the situation.
  • 7.
    Goals Classifying the sceneof the entire image using CNN. 7
  • 8.
    8 Literature In [1], Theymeasured relative densities and diversities between SUN, IMAGENET and PLACES using AMT(Automated Mechanical Transmission).They intoduced PLACES as a new dataset containing 7 million images from 476 places. In [2], They implemented dataset bias of IMAGENET and PLACES to increases the accuracy upto 70%. In [3], They stated that, Convolutional neural network helps us to simulate human vision which is amazing at scene recognition. In [4], They improved the PLACES dataset with adding extra 3 million images, containing 900 different Categeries.
  • 9.
  • 10.
    Fig: System Architecturefor Scene recognition
  • 11.
    11 Methodology Module 1: ScaledVersions: The captured images is given as the input. Module 2: Input crops: The image is biased into the object-centric(IMAGENET) and Scene-Centric(PLACES) images Module 3:Convolutional neural network(CNN) The biased images are classified with the help of the Convolutional neural network CNN Steps: 1.Input layer 2.Convolutional Layer. 3.Normalisation. 4.Max pooling 5.Output layer
  • 12.
    12 Methodology Module 4: Intra-scalefeature The output given by the max pooling(CNN) is considered as the intra-scale output. Module 5: Multi-scale feature Its combines the all intra-scale feature and predicted the accurate scene .
  • 13.
  • 14.
    14 Auditorium Hall RailroadTrack Expected Outputs
  • 15.
    02 04 01 03 15 Sun Dataset The databasecontains 397 categories. The number of images varies across categories, but there are at least 100 images per category, and 108,754 images in total. Deep learning libraries TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks. Places Dataset The places dataset, a repository of 10 million Scene photograph, labeled with scene semantic Categories and attributes, comprising a quasi-exhaustive list of the type of the types environments encountered in the world Convolutional neural network Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Related Techniques/Tools
  • 16.
    INTRODUCTION Scene Recognition ishelpful for driver less car in which the car will be able to detect the scene and it can understand the scenario (e.g.- The car will be able to understand that there is crowd on the road, pedestrian). Scenes can be classified into various categories such as indoor scenes, outdoor scenes etc. Purpose Understanding the world in a single glance is one of the most accomplished feats of the human brain. It takes only a few tens of milliseconds to recognize the category of an object or environment, emphasizing an important role of feed forward processing in visual recognition. Scope of project 16
  • 17.
    Usecase Description 17 Usecase Description CapturedImage It’s the image captured through mobile or camera. Crop the image with object centric and scene centric Captured image is cropped into object centric and scene centric Scene classification through CNN CNN are trained with object centric and scene centric dataset. Predicted scene The scene predicted by CNN Category of scene Category predicted by CNN Attribute of scene Attribute predicted by CNN
  • 18.
    Requirement Analysis 18 •Functional RequirementSpecifications •System Requirement Specifications
  • 19.
    Functional Requirement Specifications 19 1.External interface requirement 1. GPU MACHINE System Requirements 1. Hardware requirement 1. Hard-disk 500GB 2. RAM 8 GB
  • 20.
    Non-functional Requirement Specifications 20 •Portability The degreeto which software running on one platform can easily be converted to run on another platform. Can be enhanced by using languages, OS’s and tools that are universally available and standardized. Reliability The ability of the system to behave consistently in a user- acceptable manner when operating within the environment for which the system was intended. Theory and practice of hardware reliability are well established, some try to adopt them for software. •Performance High specifications leads to high performance. As we are using 64-bit operating system, this produces the more accurate output, thus leads to portable in nature.
  • 21.
  • 22.
    MODULE IDENTIFICATION 22 Module 1:Scaled Versions: The captured images is given as the input. Module 2: Input crops: The image is biased into the object-centric(IMAGENET) and Scene-Centric(PLACES) images Module 3:Convolutional neural network(CNN) The biased images are classified with the help of the Convolutional neural network Module 4: Intra-scale feature The output given by the max pooling(CNN) is considered as the intra-scale output. Module 5: Multi-scale feature Its combines the all intra-scale feature and predicted the accurate scene .
  • 23.
    23 ALGORITHM DESIGN CNN Steps: 1.Input layer 4.Max pooling 2.Convolutional Layer. 5.Output layer 3.Normalisation.
  • 24.
  • 25.
  • 26.
    26 References Research Papers [1] BoleiZhou1, Agata Lapedriza1,3, Jianxiong Xiao2, Antonio Torralba1, and Aude Oliva1,“Learning Deep Features for Scene Recognition using Places Database”. Massachusetts Institute of Technology, Princeton University. (2015) [2] Luis Herranz, Shuqiang Jiang, Xiangyang Li,“Scene recognition with CNNs: objects, scales and dataset bias”. IEEE Conference on Computer Vision and Pattern Recognition. (2016) [3] Bavin Ondieki,“Convolutional Neural Networks for Scene Recognition” Stanford University.(2016) [4] Bolei Zhou1, Agata Latdriza, Adtiya Khosala, “Places: A 10 million image database for scene recognition” IEEE Transactions on Pattern Analysis and Machine Intelligence.(2017)
  • 27.
    ANY QUESTIONS? THANK YOU! THEFUTURE STARTS TODAY, NOT TOMORROW.