SlideShare a Scribd company logo
1 of 5
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1302
SCENE DESCRIPTION FROM IMAGES TO SENTENCES
Khushali Acharya , Abhinay Pandya
1,Computer Engineering
LDRP-ITR
Gandhinagar, India
2Prof & HOD, Information Technology
LDRP-ITR
Gandhinagar, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract— People exchange their views using language,
whether spoken, written, or typed. A notable amount of this
language describes the environment around us, especially the
visual scenario in our surroundings or depicted in images or
video. Scene description aims to generate the sentences from
given set of input images. It links the visual perceptionwith the
language space. All present approaches are purely found in
supervised machine learning setup. However, owing to the
dearth of training data, this seldom achieves desiredaccuracy.
We present a model that uses “Distributed Intelligence” as the
prevalent theme in the artificial intelligence literature. Rather
than only relying on the training dataset (PASCAL VOC 2012
containing 11530 images, FLICKR8K, FLICKR30K & MSCOCO),
we harness the power of internet in order to generate more
precise sentences related to the images.
Keywords—Image Processing, Opencv, Distributed
Intelligence, Object Detection
1. INTRODUCTION
“A picture is worth a 1000 words” WRONG! Why?
We see in our daily life how important words are to us. If a
picture is enough then everyone must wonder why these
memes we see on Instagram do and Facebook daily have
sentences written down. There are hashtags, descriptions,
comments etc. to make the post more interesting. Thus,
description of an image makes it more compelling and
captivating. We understand image in our brains with
processes that we do not know enough about. For example,
have a glance at an image below.
Fig 1: A group of tribal people selling vegetables in market.
One can describe this image as:
- A group of people sitting on land.
- A group of people selling bananas/fruits.
- A group of tribal people selling fruits in a local market.
From above sentences, the third sentence seems the most
acceptable. Will the answer be same if asked to a 7-year old?
Of course no. How would you communicate with Aliens?How
would you teach them your language? The way we teach
children right? By showing them pictures of various objects
and their names. In order to make a meaningful sentence
some previous knowledge is must in case of humans as well
as computers. Thus in order to transfer information
unambiguously, we must attach textual cues to support its
correct interpretation/view-point.
Scene Description can be helpful in various fields. Some of
them are listed below:
 Robotics
 Content-based image search
 For visually impaired people
 Soccer game analysis
 Criminal Act Recognition
 Agricultural Sector
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1303
2. RELATED WORK
Show, Attend and Tell: Neural Image Caption Generation
with Visual Attention (2015) [1]:
This paper trains the model in a deterministic manner using
standard back propagation techniques. Two attention-based
caption generation are introduced under a common
framework: 1) a “soft” deterministic attention mechanism
trainable by standard back-propagation methods and 2) a
“hard” stochastic attention mechanism trainable by
maximizing an approximate variational lower bound or
equivalently by REINFORCE (Williams, 1992).
Deep Visual-Semantic Alignments for Generating Image
Descriptions (2015) [2]:
Their approach leverages datasets of images and their
sentence descriptions to learn about the inter-modal
correspondences between language and visual data. It is a
combination of Convolutional Neural Networks over image
regions, bidirectional Recurrent Neural Networks over
sentences, and a structured objective that aligns the two
modalities through a multimodal embedding. They first
present a model that aligns sentence snippets to the visual
regions that they describe through a multimodal embedding.
Then they treat these correspondences as training data for a
second, multimodal Recurrent Neural Network model that
learns to generate the snippets.
Choosing Linguistics over Vision to Describe Images
(2012) [3]:
Given a collection of images and their corresponding human-
generated descriptions, they address the problem of
automatically generating human-like descriptionsforunseen
images. This method captures the semantics of an image
using information coded in its description and provide
extensive evaluations to test the applicability of their model
on IAPR TC-12 benchmark.
Baby Talk: Understanding and Generating Simple Image
Descriptions (2013) [4]:
This system automatically generates natural language
descriptions from images that exploits both statistics gleaned
from parsing large quantities of text data and recognition
algorithms from computer vision. This is accomplished by
detecting objects, modifiers (adjectives), and spatial
relationships (prepositions), in an image, smoothing these
detections with respect to a statistical prior obtained from
descriptive text, and then using the smoothed results as
constraints for sentence generation. Sentence generation is
performed either using a n-gram language model or a simple
template based approach.
3. OUR APPROACH
1) Object detection and labelling
The very first step in our approach is to detect the objects
from the image. Our goal in this section is to get the bounding
boxes on the recognized objectsalong withtheco-ordinatesof
it. For this purpose we are going to use the YOLO [5]
Detection system which uses the Darknet [6] framework.
YOLO has 24 convolutional layers followed by 2 fully
connected layers. While GoogleNet uses inception modules,
YOLO simply use 1×1 reduction layers followed by 3×3
convolutional layers, similar to Lin et al [7]. A single
convolutional neural network performs feature extraction,
bounding box prediction, nonmaximal suppression, and
contextual reasoning all concurrently. Instead of static
features, the network trains the featuresin-lineandoptimizes
them for the detection task.
2) Find Relative Position of these Objects
The next step is to find the relative position of the object with
respect to another object and background. If there are N
objects detected in our image, then there will be nC2relations
between these objects. Simple prepositional functions that
evaluate the spatialrelationshipsbetween boundingboxesare
designed which provide a score for each of 16 preposition
terms (e.g. above, under, against, beneath, in, on etc.). For
example, the score for ‘above (a, b)’ is computed as the
percentage of region ‘a’ that lies in the image rectangle ‘above’
the bounding box around region ‘b’. The potentialfor‘near (a,
b)’ is computed as the minimum distance between region ‘a’
and region ‘b’ divided by the diagonal size of a bounding box
around region ‘a’. Similar functions are used for the other
preposition terms.
3) Find a text label for the background scene
We need to find the text label for the background scenein our
next step. This is done by eliminating the detected objects
from the image. The remaining scene in the image will be
considered as a background. This process is done by using
Multiclass Support Vector Machines trained by using various
background scenes.
The primary process of eliminating the objects is done by
applying the inverse operation to the output of the grabcut
algorithm. Originally grabcut is a foreground extraction
algorithm but we transpose its result in order to get the
background. In this algorithm, user needs to draw rectangle
around the foreground area. Then algorithm segments it
iteratively to get the best result.
After separatingtheforegroundandbackground,itstranspose
operation is performed which leads to the backgroundimage.
Next we are going to use the concept of support vector
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1304
machine. Support vector machines (SVMs) are a set of
supervised learning methods used for classification,
regression and outlier’s detection.
4) Populate a tuple for that image
As of now, we have the three things with us, the detected
objects, the label of the background and spatial relationship
between them. These three entities will form a tuple of the
form, <obj 1, obj 2…obj n, Scene, Relations>. There can be
multiple detected objects in the scenario and Scene is the
Background labelling.Relationshererepresentattributessuch
as directions, background-foreground relations, touching,
non-touching etc.
5) Find from the parallel corpora or word co-occurrencedata,
the most likely set of sentences containing these words
By now we have got the tuple of the objects, scene and the
relationship between them. But these individuals are not
enough to generate a meaningful sentence. In order to
generate a relevant sentence, we are going to use the notion
of “Distributed Intelligence”.
We are going to infuse the knowledge source to the problem
statement in order to solve it. Infusing knowledge means
providing dataset to the computer. The dataset used here is
Wikipedia. Based on this dataset, an ontology will be
populated. This idea is not hand crafted, it isautomatic.So this
is statistically probabilisticgraphicalmodel.Itisanundirected
graphical model much in the lines of Conditional Random
Field (CRF) or Markov Random Field (MRF).
The tuple <obj 1, obj 2…obj n, Scene, Relations>, is compared
with the Wikipedia dataset. Top 5 sentences with the
combination of words in the tuple are extracted.Wehaveprior
knowledge in the form of Wikipedia sentences. So we will
have likelihood to be calculated based on the given prior
information and after integrating them in the Bayesian
framework, the final answer is obtained.
6) Perform Google image search and find relevant images for
each such sentence
There are 5 most relevant sentences extracted from the Wiki
dataset containing the words of the tuple. The Google Image
Search API provides a JavaScript interface to embed Google
Image Search results in your website or application. For each
one of the 5 sentences, the following steps are performed:
 Give input as the sentence retrieved from abovestep
to Google Image API.
 Download top 10 images for each such sentence.
 Compare each of the 10 images with the original
image and generate a similarity score.
 The highest matching image with the actual imageis
taken and kept in the table along with its score.
 This process is repeated for all 5 sentences and thus
we get 5 images with its score in the form of a table.
7) Compare retrieved images with theinputimageandassign
similarity score
The downloaded 10 images for each sentence are compared
with the actual image. Various algorithms are used for
comparison such as key point matching, Histogrammethods,
feature matching techniques with decision trees etc
One can also use distance metrics such as
Euclidean distance, Manhattan (also called City block)
distance, and the Chebyshev distance. A similarity
measurement must be selected to decide how close a vector
is to another vector. The problem can be converted to
computing the discrepancybetweentwovectorsx,y ∈Rd.The
Euclidean distance between x, y ∈ Rd is computed by:
The Manhattan (city block distance), which takes fewer
operations, is computed by:
8) Find the image which has the highest similarity score with
the input image. The sentence corresponding to this image is
most likely the one for our image too.
Till now, we have achieved similarity scores of all the
retrieved relevant images. Of all the 5 sentences and their
relevant 10 images per sentence, those image with highest
similarity score or least distance metric will be picked up. The
sentence for the highest scorer image will be the one for
actual image too.
The RGB Color Histogram of actualimageandall otherimages
are generated. Then Euclidean distance between all the
images is calculated. The image having least distance with
actual image is taken as intermediate result. The
corresponding sentence with this intermediate result is the
original sentence which describes the scene or the image
taken as input.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1305
Fig 2: Architecture of our approach
4. EXPERIMENTS AND RESULTS
As the first step of our implementation, we start with
detecting objects from the image. According to [5], ImageNet
1000-class competition dataset and PASCAL VOC detection
dataset is used.
Fig 3: Object Detection and Labelling
There are multiple bounding boxes detected in the above
step. (x,y) coordinates of all the bounding boxes are detected
along with the probability of existenceofthatobject.Fromits
coordinates, the spatial relationship between the objectscan
be established.
We are going to compare our result with our base paper that
is Babytalk. Comparison is based on two systems: BLEU and
ROUGE.
BLEU Score Measured for Generated Descriptions versus the
Set of Descriptions Produced by Human Annotators
No. of Images Babytalk [4] Our Approach
100 0.40 0.42
200 0.45 0.37
500 0.38 0.37
ROUGE Score Measured for Generated Descriptions versus
the Set of Descriptions Produced by Human Annotators
No. of Images Babytalk [4] Our Approach
100 0.43 0.44
200 0.41 0.33
500 0.36 0.38
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1306
5. DISCUSSION ON RESULTS OBTAINED
We compare our results with only [babytalk] since their
work comes closest to ours. The research community uses
BLEU score or ROUGE score which are essentially a measure
of lexical similarity of sentencesinthegroundtruthdata with
the sentences our algorithm generates. This is not quite
proper since BLEU and ROUGE miss the semantic closeness
of the sentences. We find that many of sentences our
algorithm generated are matching with humanjudgmentbut
did not match well with the ground truth sentences exactly.
Some of the images for which our algorithm fails to find
description are such that their scene backgrounds were not
found in our training images. Our algorithm relies on
transferring acquired knowledge to new unseen images and
hence it fails when a new scene background is found.
However, this problem can easily be overcome if we expand
our training dataset to include many possible backgrounds.
Another source of error was misidentificationofobjects.This
too can be resolved if more training data samples are used.
Lighting conditions, qualityofimages,andother noisefactors
were also responsible for somewhat poor objectrecognition.
6. CONCLUSION AND FUTURE WORK
Thus we are proposing an approachthatgeneratessentences
from image by understanding the entities in the image and
their correlations among each other, which further uses
Distributed Intelligence in order to generate proper
sentences.
The series of activities that are carried out in entire process
are: Object and scene recognition, tuple generation, sentence
extraction, searching images and take sentence with highest
similarity score as output. Based on further experience, the
work can be expanded in future so as to increase the
efficiency.
 Here, we record the top 5 sentences from the web
depending on the likelihood of the words from the
tuple. This reduces the chances of detecting unusual
activities being captured from image. So in future,
this discrepancy can be eliminated.
 The object detection technique fall short in detecting
smaller objects in the background. This can also be
improved so that minute objects get recognized.
 Instead of using Wiki dataset, one can go for much
larger dataset that covers wide range of sentences
with unusual activities also.
7. REFERENCES
[1] Kelvin Xu ,Jimmy Lei Ba ,Ryan Kiros ,Kyunghyun Cho
,Aaron Courville ,Ruslan Salakhutdinov, Richard S. Zemel,
Yoshua Bengio ;Show, Attend and Tell: Neural Image Caption
Generation with Visual Attention, 2015
[2] Andrej Karpathy, Li Fei-Fei Department of Computer
Science, Stanford University; Deep Visual-Semantic
Alignments for Generating Image Descriptions, 2015
[3] Ankush Gupta. Yashaswi Verma, C.V. Jawahar; Choosing
Linguistics over Vision to Describe Images, 2012
[4] Girish Kulkarni,Visruth Premraj,Sagnik Dhar,Siming
Li,Yejin Choi,Alexander C Berg,Tamara L Berg,Stony Brook
University,Stony Brook University, NY 11794, USA Baby
Talk: Understanding and Generating Simple Image
Descriptions, 2013
[5] Joseph Redmon∗, Santosh Divvala∗†, Ross Girshick¶, Ali
Farhadi∗† University of Washington∗, Allen Institute for AI†,
Facebook AI Research¶;YouOnlyLookOnce: Unified,RealTime
Object Detection, 2016
[6] Joseph Redmon, Darknet: Open Source Neural
Networks in C, http://pjreddie.com/yolo/, 2013-2016
[7] M. Lin, Q. Chen, and S. Yan. Network in network. CoRR,
abs/1312.4400, 2013.

More Related Content

What's hot

GROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCEGROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCEijaia
 
Understanding user intention in image retrieval: generalization selection usi...
Understanding user intention in image retrieval: generalization selection usi...Understanding user intention in image retrieval: generalization selection usi...
Understanding user intention in image retrieval: generalization selection usi...TELKOMNIKA JOURNAL
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Test PDF
Test PDFTest PDF
Test PDFAlgnuD
 
Pioneering VDT Image Compression using Block Coding
Pioneering VDT Image Compression using Block CodingPioneering VDT Image Compression using Block Coding
Pioneering VDT Image Compression using Block CodingDR.P.S.JAGADEESH KUMAR
 
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixSteganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixCSCJournals
 
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYIMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYijcsit
 
Reproducible data science and business solutions
Reproducible data science and business solutionsReproducible data science and business solutions
Reproducible data science and business solutionsAntonio Rueda-Toicen
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingIAEME Publication
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET Journal
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Deep Learning Italia
 
Hash Visualization a New Technique to improve Real World CyberSecurity
Hash Visualization a New Technique to improve Real World CyberSecurityHash Visualization a New Technique to improve Real World CyberSecurity
Hash Visualization a New Technique to improve Real World CyberSecurityIan Beckett
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...IAESIJEECS
 
Image re ranking system
Image re ranking systemImage re ranking system
Image re ranking systemveningstonk
 

What's hot (19)

Cl4301502506
Cl4301502506Cl4301502506
Cl4301502506
 
GROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCEGROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCE
 
Understanding user intention in image retrieval: generalization selection usi...
Understanding user intention in image retrieval: generalization selection usi...Understanding user intention in image retrieval: generalization selection usi...
Understanding user intention in image retrieval: generalization selection usi...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Test PDF
Test PDFTest PDF
Test PDF
 
IEEE ICAPR 2009
IEEE ICAPR 2009IEEE ICAPR 2009
IEEE ICAPR 2009
 
Pioneering VDT Image Compression using Block Coding
Pioneering VDT Image Compression using Block CodingPioneering VDT Image Compression using Block Coding
Pioneering VDT Image Compression using Block Coding
 
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixSteganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
 
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYIMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
 
Reproducible data science and business solutions
Reproducible data science and business solutionsReproducible data science and business solutions
Reproducible data science and business solutions
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework
 
D0341829
D0341829D0341829
D0341829
 
Hash Visualization a New Technique to improve Real World CyberSecurity
Hash Visualization a New Technique to improve Real World CyberSecurityHash Visualization a New Technique to improve Real World CyberSecurity
Hash Visualization a New Technique to improve Real World CyberSecurity
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...
 
F0343545
F0343545F0343545
F0343545
 
Image re ranking system
Image re ranking systemImage re ranking system
Image re ranking system
 

Similar to Scene Description From Images To Sentences

IRJET- Neural Story Teller using RNN and Generative Algorithm
IRJET- Neural Story Teller using RNN and Generative AlgorithmIRJET- Neural Story Teller using RNN and Generative Algorithm
IRJET- Neural Story Teller using RNN and Generative AlgorithmIRJET Journal
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
Image Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningImage Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningIRJET Journal
 
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...IRJET Journal
 
IRJET- Image Captioning using Multimodal Embedding
IRJET-  	  Image Captioning using Multimodal EmbeddingIRJET-  	  Image Captioning using Multimodal Embedding
IRJET- Image Captioning using Multimodal EmbeddingIRJET Journal
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2IRJET Journal
 
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTIONMUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTIONIRJET Journal
 
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Trackingijsrd.com
 
ArtificialIntelligenceInObjectDetection-Report.pdf
ArtificialIntelligenceInObjectDetection-Report.pdfArtificialIntelligenceInObjectDetection-Report.pdf
ArtificialIntelligenceInObjectDetection-Report.pdfAbishek86232
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectIOSR Journals
 
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET Journal
 
A Review on Matching For Sketch Technique
A Review on Matching For Sketch TechniqueA Review on Matching For Sketch Technique
A Review on Matching For Sketch TechniqueIOSR Journals
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learningijtsrd
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind PeopleIRJET Journal
 
IRJET - Gender and Age Prediction using Wideresnet Architecture
IRJET - Gender and Age Prediction using Wideresnet ArchitectureIRJET - Gender and Age Prediction using Wideresnet Architecture
IRJET - Gender and Age Prediction using Wideresnet ArchitectureIRJET Journal
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
IRJET- Facial Expression Recognition using GPA Analysis
IRJET-  	  Facial Expression Recognition using GPA AnalysisIRJET-  	  Facial Expression Recognition using GPA Analysis
IRJET- Facial Expression Recognition using GPA AnalysisIRJET Journal
 
Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation
Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation
Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation cscpconf
 

Similar to Scene Description From Images To Sentences (20)

IRJET- Neural Story Teller using RNN and Generative Algorithm
IRJET- Neural Story Teller using RNN and Generative AlgorithmIRJET- Neural Story Teller using RNN and Generative Algorithm
IRJET- Neural Story Teller using RNN and Generative Algorithm
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Image Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningImage Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep Learning
 
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
 
IRJET- Image Captioning using Multimodal Embedding
IRJET-  	  Image Captioning using Multimodal EmbeddingIRJET-  	  Image Captioning using Multimodal Embedding
IRJET- Image Captioning using Multimodal Embedding
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTIONMUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
MUSIC RECOMMENDATION THROUGH FACE RECOGNITION AND EMOTION DETECTION
 
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
 
ArtificialIntelligenceInObjectDetection-Report.pdf
ArtificialIntelligenceInObjectDetection-Report.pdfArtificialIntelligenceInObjectDetection-Report.pdf
ArtificialIntelligenceInObjectDetection-Report.pdf
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an Object
 
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
 
H018124360
H018124360H018124360
H018124360
 
A Review on Matching For Sketch Technique
A Review on Matching For Sketch TechniqueA Review on Matching For Sketch Technique
A Review on Matching For Sketch Technique
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind People
 
IRJET - Gender and Age Prediction using Wideresnet Architecture
IRJET - Gender and Age Prediction using Wideresnet ArchitectureIRJET - Gender and Age Prediction using Wideresnet Architecture
IRJET - Gender and Age Prediction using Wideresnet Architecture
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
IRJET- Facial Expression Recognition using GPA Analysis
IRJET-  	  Facial Expression Recognition using GPA AnalysisIRJET-  	  Facial Expression Recognition using GPA Analysis
IRJET- Facial Expression Recognition using GPA Analysis
 
Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation
Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation
Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 

Recently uploaded (20)

power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 

Scene Description From Images To Sentences

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1302 SCENE DESCRIPTION FROM IMAGES TO SENTENCES Khushali Acharya , Abhinay Pandya 1,Computer Engineering LDRP-ITR Gandhinagar, India 2Prof & HOD, Information Technology LDRP-ITR Gandhinagar, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract— People exchange their views using language, whether spoken, written, or typed. A notable amount of this language describes the environment around us, especially the visual scenario in our surroundings or depicted in images or video. Scene description aims to generate the sentences from given set of input images. It links the visual perceptionwith the language space. All present approaches are purely found in supervised machine learning setup. However, owing to the dearth of training data, this seldom achieves desiredaccuracy. We present a model that uses “Distributed Intelligence” as the prevalent theme in the artificial intelligence literature. Rather than only relying on the training dataset (PASCAL VOC 2012 containing 11530 images, FLICKR8K, FLICKR30K & MSCOCO), we harness the power of internet in order to generate more precise sentences related to the images. Keywords—Image Processing, Opencv, Distributed Intelligence, Object Detection 1. INTRODUCTION “A picture is worth a 1000 words” WRONG! Why? We see in our daily life how important words are to us. If a picture is enough then everyone must wonder why these memes we see on Instagram do and Facebook daily have sentences written down. There are hashtags, descriptions, comments etc. to make the post more interesting. Thus, description of an image makes it more compelling and captivating. We understand image in our brains with processes that we do not know enough about. For example, have a glance at an image below. Fig 1: A group of tribal people selling vegetables in market. One can describe this image as: - A group of people sitting on land. - A group of people selling bananas/fruits. - A group of tribal people selling fruits in a local market. From above sentences, the third sentence seems the most acceptable. Will the answer be same if asked to a 7-year old? Of course no. How would you communicate with Aliens?How would you teach them your language? The way we teach children right? By showing them pictures of various objects and their names. In order to make a meaningful sentence some previous knowledge is must in case of humans as well as computers. Thus in order to transfer information unambiguously, we must attach textual cues to support its correct interpretation/view-point. Scene Description can be helpful in various fields. Some of them are listed below:  Robotics  Content-based image search  For visually impaired people  Soccer game analysis  Criminal Act Recognition  Agricultural Sector
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1303 2. RELATED WORK Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (2015) [1]: This paper trains the model in a deterministic manner using standard back propagation techniques. Two attention-based caption generation are introduced under a common framework: 1) a “soft” deterministic attention mechanism trainable by standard back-propagation methods and 2) a “hard” stochastic attention mechanism trainable by maximizing an approximate variational lower bound or equivalently by REINFORCE (Williams, 1992). Deep Visual-Semantic Alignments for Generating Image Descriptions (2015) [2]: Their approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. It is a combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. They first present a model that aligns sentence snippets to the visual regions that they describe through a multimodal embedding. Then they treat these correspondences as training data for a second, multimodal Recurrent Neural Network model that learns to generate the snippets. Choosing Linguistics over Vision to Describe Images (2012) [3]: Given a collection of images and their corresponding human- generated descriptions, they address the problem of automatically generating human-like descriptionsforunseen images. This method captures the semantics of an image using information coded in its description and provide extensive evaluations to test the applicability of their model on IAPR TC-12 benchmark. Baby Talk: Understanding and Generating Simple Image Descriptions (2013) [4]: This system automatically generates natural language descriptions from images that exploits both statistics gleaned from parsing large quantities of text data and recognition algorithms from computer vision. This is accomplished by detecting objects, modifiers (adjectives), and spatial relationships (prepositions), in an image, smoothing these detections with respect to a statistical prior obtained from descriptive text, and then using the smoothed results as constraints for sentence generation. Sentence generation is performed either using a n-gram language model or a simple template based approach. 3. OUR APPROACH 1) Object detection and labelling The very first step in our approach is to detect the objects from the image. Our goal in this section is to get the bounding boxes on the recognized objectsalong withtheco-ordinatesof it. For this purpose we are going to use the YOLO [5] Detection system which uses the Darknet [6] framework. YOLO has 24 convolutional layers followed by 2 fully connected layers. While GoogleNet uses inception modules, YOLO simply use 1×1 reduction layers followed by 3×3 convolutional layers, similar to Lin et al [7]. A single convolutional neural network performs feature extraction, bounding box prediction, nonmaximal suppression, and contextual reasoning all concurrently. Instead of static features, the network trains the featuresin-lineandoptimizes them for the detection task. 2) Find Relative Position of these Objects The next step is to find the relative position of the object with respect to another object and background. If there are N objects detected in our image, then there will be nC2relations between these objects. Simple prepositional functions that evaluate the spatialrelationshipsbetween boundingboxesare designed which provide a score for each of 16 preposition terms (e.g. above, under, against, beneath, in, on etc.). For example, the score for ‘above (a, b)’ is computed as the percentage of region ‘a’ that lies in the image rectangle ‘above’ the bounding box around region ‘b’. The potentialfor‘near (a, b)’ is computed as the minimum distance between region ‘a’ and region ‘b’ divided by the diagonal size of a bounding box around region ‘a’. Similar functions are used for the other preposition terms. 3) Find a text label for the background scene We need to find the text label for the background scenein our next step. This is done by eliminating the detected objects from the image. The remaining scene in the image will be considered as a background. This process is done by using Multiclass Support Vector Machines trained by using various background scenes. The primary process of eliminating the objects is done by applying the inverse operation to the output of the grabcut algorithm. Originally grabcut is a foreground extraction algorithm but we transpose its result in order to get the background. In this algorithm, user needs to draw rectangle around the foreground area. Then algorithm segments it iteratively to get the best result. After separatingtheforegroundandbackground,itstranspose operation is performed which leads to the backgroundimage. Next we are going to use the concept of support vector
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1304 machine. Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outlier’s detection. 4) Populate a tuple for that image As of now, we have the three things with us, the detected objects, the label of the background and spatial relationship between them. These three entities will form a tuple of the form, <obj 1, obj 2…obj n, Scene, Relations>. There can be multiple detected objects in the scenario and Scene is the Background labelling.Relationshererepresentattributessuch as directions, background-foreground relations, touching, non-touching etc. 5) Find from the parallel corpora or word co-occurrencedata, the most likely set of sentences containing these words By now we have got the tuple of the objects, scene and the relationship between them. But these individuals are not enough to generate a meaningful sentence. In order to generate a relevant sentence, we are going to use the notion of “Distributed Intelligence”. We are going to infuse the knowledge source to the problem statement in order to solve it. Infusing knowledge means providing dataset to the computer. The dataset used here is Wikipedia. Based on this dataset, an ontology will be populated. This idea is not hand crafted, it isautomatic.So this is statistically probabilisticgraphicalmodel.Itisanundirected graphical model much in the lines of Conditional Random Field (CRF) or Markov Random Field (MRF). The tuple <obj 1, obj 2…obj n, Scene, Relations>, is compared with the Wikipedia dataset. Top 5 sentences with the combination of words in the tuple are extracted.Wehaveprior knowledge in the form of Wikipedia sentences. So we will have likelihood to be calculated based on the given prior information and after integrating them in the Bayesian framework, the final answer is obtained. 6) Perform Google image search and find relevant images for each such sentence There are 5 most relevant sentences extracted from the Wiki dataset containing the words of the tuple. The Google Image Search API provides a JavaScript interface to embed Google Image Search results in your website or application. For each one of the 5 sentences, the following steps are performed:  Give input as the sentence retrieved from abovestep to Google Image API.  Download top 10 images for each such sentence.  Compare each of the 10 images with the original image and generate a similarity score.  The highest matching image with the actual imageis taken and kept in the table along with its score.  This process is repeated for all 5 sentences and thus we get 5 images with its score in the form of a table. 7) Compare retrieved images with theinputimageandassign similarity score The downloaded 10 images for each sentence are compared with the actual image. Various algorithms are used for comparison such as key point matching, Histogrammethods, feature matching techniques with decision trees etc One can also use distance metrics such as Euclidean distance, Manhattan (also called City block) distance, and the Chebyshev distance. A similarity measurement must be selected to decide how close a vector is to another vector. The problem can be converted to computing the discrepancybetweentwovectorsx,y ∈Rd.The Euclidean distance between x, y ∈ Rd is computed by: The Manhattan (city block distance), which takes fewer operations, is computed by: 8) Find the image which has the highest similarity score with the input image. The sentence corresponding to this image is most likely the one for our image too. Till now, we have achieved similarity scores of all the retrieved relevant images. Of all the 5 sentences and their relevant 10 images per sentence, those image with highest similarity score or least distance metric will be picked up. The sentence for the highest scorer image will be the one for actual image too. The RGB Color Histogram of actualimageandall otherimages are generated. Then Euclidean distance between all the images is calculated. The image having least distance with actual image is taken as intermediate result. The corresponding sentence with this intermediate result is the original sentence which describes the scene or the image taken as input.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1305 Fig 2: Architecture of our approach 4. EXPERIMENTS AND RESULTS As the first step of our implementation, we start with detecting objects from the image. According to [5], ImageNet 1000-class competition dataset and PASCAL VOC detection dataset is used. Fig 3: Object Detection and Labelling There are multiple bounding boxes detected in the above step. (x,y) coordinates of all the bounding boxes are detected along with the probability of existenceofthatobject.Fromits coordinates, the spatial relationship between the objectscan be established. We are going to compare our result with our base paper that is Babytalk. Comparison is based on two systems: BLEU and ROUGE. BLEU Score Measured for Generated Descriptions versus the Set of Descriptions Produced by Human Annotators No. of Images Babytalk [4] Our Approach 100 0.40 0.42 200 0.45 0.37 500 0.38 0.37 ROUGE Score Measured for Generated Descriptions versus the Set of Descriptions Produced by Human Annotators No. of Images Babytalk [4] Our Approach 100 0.43 0.44 200 0.41 0.33 500 0.36 0.38
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1306 5. DISCUSSION ON RESULTS OBTAINED We compare our results with only [babytalk] since their work comes closest to ours. The research community uses BLEU score or ROUGE score which are essentially a measure of lexical similarity of sentencesinthegroundtruthdata with the sentences our algorithm generates. This is not quite proper since BLEU and ROUGE miss the semantic closeness of the sentences. We find that many of sentences our algorithm generated are matching with humanjudgmentbut did not match well with the ground truth sentences exactly. Some of the images for which our algorithm fails to find description are such that their scene backgrounds were not found in our training images. Our algorithm relies on transferring acquired knowledge to new unseen images and hence it fails when a new scene background is found. However, this problem can easily be overcome if we expand our training dataset to include many possible backgrounds. Another source of error was misidentificationofobjects.This too can be resolved if more training data samples are used. Lighting conditions, qualityofimages,andother noisefactors were also responsible for somewhat poor objectrecognition. 6. CONCLUSION AND FUTURE WORK Thus we are proposing an approachthatgeneratessentences from image by understanding the entities in the image and their correlations among each other, which further uses Distributed Intelligence in order to generate proper sentences. The series of activities that are carried out in entire process are: Object and scene recognition, tuple generation, sentence extraction, searching images and take sentence with highest similarity score as output. Based on further experience, the work can be expanded in future so as to increase the efficiency.  Here, we record the top 5 sentences from the web depending on the likelihood of the words from the tuple. This reduces the chances of detecting unusual activities being captured from image. So in future, this discrepancy can be eliminated.  The object detection technique fall short in detecting smaller objects in the background. This can also be improved so that minute objects get recognized.  Instead of using Wiki dataset, one can go for much larger dataset that covers wide range of sentences with unusual activities also. 7. REFERENCES [1] Kelvin Xu ,Jimmy Lei Ba ,Ryan Kiros ,Kyunghyun Cho ,Aaron Courville ,Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio ;Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 2015 [2] Andrej Karpathy, Li Fei-Fei Department of Computer Science, Stanford University; Deep Visual-Semantic Alignments for Generating Image Descriptions, 2015 [3] Ankush Gupta. Yashaswi Verma, C.V. Jawahar; Choosing Linguistics over Vision to Describe Images, 2012 [4] Girish Kulkarni,Visruth Premraj,Sagnik Dhar,Siming Li,Yejin Choi,Alexander C Berg,Tamara L Berg,Stony Brook University,Stony Brook University, NY 11794, USA Baby Talk: Understanding and Generating Simple Image Descriptions, 2013 [5] Joseph Redmon∗, Santosh Divvala∗†, Ross Girshick¶, Ali Farhadi∗† University of Washington∗, Allen Institute for AI†, Facebook AI Research¶;YouOnlyLookOnce: Unified,RealTime Object Detection, 2016 [6] Joseph Redmon, Darknet: Open Source Neural Networks in C, http://pjreddie.com/yolo/, 2013-2016 [7] M. Lin, Q. Chen, and S. Yan. Network in network. CoRR, abs/1312.4400, 2013.