SlideShare a Scribd company logo
1 of 38
FROM IMAGES TO SENTENCES
SCENE
DESCRIPTION
KHUSHALI ACHARYA
1517MECE30008
AGENDA
1 Introduction
2 Motivation
3 Related research work
4 Our Approach
5 Conclusion
INTRODUCTION TO
SCENE DESCRIPTION
1
WHAT IS IT?
“Interpreting images and generating sentences. ”
“Scene interpretation means understanding every-
day occurrences or recognizing rare events.”
“Scene interpretations are Controlled Hallucinations.”
WHY DO WE NEED SUCH SYSTEMS?
Isn't a picture enough to depict the
things clearly?
Ever imagined a cricket match without commentary? Or
A movie without dialogues?
It has been estimated that more than
80% of the activities we do online are
text-based.
A laymen can’t understand
the medical reports unless
the doctor makes him
understand or it is in written
form.
Medical reports
Tagged location becomes clear if the place name is
described.
Thus we saw that the description of an
image adds an interestingness measure
to it.
MOTIVATION TO THE
PROBLEM
2
WHAT ARE THE REAL
TIME APPLICATIONS?
2.1
• Self-aware cognitive robots
• Assist Visually impaired people
• Soccer game analysis
• Image search/ retrieval systems
• Street traffic observations
• Criminal act recognition
• Agricultural sector
Some Applications of scene description
HOW IT IS HELPFUL
TO SOCIETY?
2.2
 Assists Visually Impaired People
Screen Reader
Screen Readers are software programs that
convert text into synthesized speech and blind
people are able to listen to web content.
LIMITATIONS:
• Screen readers cannot describe images.
• Screen readers cannot survey the entirety of a
web page as a visual user might do. It cannot
always intelligently skip over extraneous
content, such as advertisements or navigation
bars.
Scene description can be helpful to blind people in this
manner.
Images are captured and unusual activities are
recorded.
The features of images are extracted which thus help in
crime investigation.
 Criminal Act Recognition
Efficient and consistent scene interpretation is a
prerequisite self aware cognitive robots to work.
 Human Computer Interaction
Object
Recognition
and scene
interpretation
Spatial
Relation
Extraction
RELATED EXISTING
RESEARCH WORK
3
RESEARCH PAPERS
3.1
Sr. No. PAPER PROPOSED DATASET CONCLUSION
1. “Midge: Generating
image descriptions from
computer vision
detection”
U. of Aberdeen and Oregon Health
and Science University, Stony Brook
University, U. of Maryland, Columbia
University, U. of Washington, MIT.
This paper introduces a
novel generation
system that composes
humanlike descriptions
of images from computer
vision detections.
For training:700,000
(Flickr, 2011) images
with
associated
descriptions from the
dataset in Ordonez
et al. (2011).
For evaluation:840
PASCAL
images.
Midge generates a well-
formed description of an
image by filtering attribute
detections that are unlikely
and placing objects into an
ordered syntactic
structure.
2. “Every picture tells a
story: Generating
sentences
from images.”
Farhadi, A., Hejrati, S. M. M.,
Sadeghi, M. A., Young, P.,
Rashtchian, C., Hockenmaier, J., and
Forsyth, D. A.
(2010). Springer.
attempts
to “generate” sentences by
first learning from
a set of human annotated
examples, and producing
the same sentence if both
images and sentence
share common properties
in terms of their triplets:
(Nouns-Verbs-Scenes).
PASCAL 2008 images
with human
annotation
Sentences are rich, compact
and subtle representations
of information. Even so, we
can predict good sentences
for images that people like.
The intermediate meaning
representation is one key
component in our model as
it allows benefiting from
distributional semantics.
Sr. No. PAPER PROPOSED DATASET CONCLUSION
3. “Babytalk: Understanding
and generating simple
image descriptions”
G Kulkarni, V Premraj, V Ordonez.
IEEE TRANSACTIONS ON PATTERN
ANALYSIS AND MACHINE
INTELLIGENCE, VOL. 35, NO. 12,
DECEMBER 2013
It uses detector for object
scene detection and make
quadruplet: (Nouns-Verbs-
Scenes-preposition).
PASCAL 2008
images
Human-forced choice
experiments
demonstrate the quality of the
generated sentences
over previous approaches. One
key to the success of our
system was automatically mining
and parsing large text
collections to obtain statistical
models for visually descriptive
language.
4. “Choosing Linguistics over
Vision to Describe
Images”
Ankush Gupta, Yashaswi Verma, C.
V. Jawahar
International Institute of
Information Technology, Hyderabad,
India – 500032
Problem of automatically
generating human-like
descriptions for unseen
images,
given a collection of
images and their
corresponding
human-generated
descriptions.
PASCAL dataset They proposed a novel approach
for generating relevant, fluent
and human-like descriptions for
images without relying
on any object detectors,
classifiers, hand-written rules or
heuristics.
THEIR APPROACH
3.2
1.) Choosing Linguistics over Vision to
Describe Images
i. Given an unseen image,
ii. find K images most similar to it from the training images,
and using the phrases extracted from their descriptions
iii. generate a ranked list of triples which is then used to
compose description for the new image.
i. input image ii.) Neighboring
images with
extracted phrases
iii.) Triple section
and sentence
generation
FAILURE SCENARIO
A motor racer is speeding
through a splash mud.
A water cow is grazing
along a roadside.
An orange fixture is hanging
in a messy kitchen.
OUR APPROACH
4
USING
OPEN CV , NLP & SVM
4.1
OPEN CV
• Open source computer vision and machine learning
software library.
• More than 2500 optimized algorithms.
• C++, C, Python, Java and MATLAB interfaces
• Supports Windows, Linux, Android and Mac OS
NLP(Natural Language Processing)
It is a field of computer science, artificial intelligence, and
computational linguistics concerned with the interactions
between computers and human (natural) languages.
SVM(Support Vector Machines)
A discriminative classifier formally
defined by a separating
hyperplane. In other words, given
labeled training data (supervised
learning), the algorithm outputs an
optimal hyperplane which
categorizes new examples.
SYSTEM FLOW
4.2
Take query
image as input
Detect
objects from
query image
Corpus data
& extract
shortest
sentences
RDF (Resource
Description
Framework)Pa
rser
<object1,predicate1,object2>
Google image API
retrieve top
10 images
Match query
image and
compute score
for each
retrieval image
Highest score
images are our
matching triplet
EXPERIMENTAL
SETUP
4.3
DATA SET
PASCAL (Pattern Analysis, Statistical Modeling and Computational
Learning)
It provides standardized image data sets for object class
recognition
Technology
JAVA/PYTHON
Thus we saw the fundamentals
of scene description, its
applications, previous work in
this field and our approach for
designing this system.
5. CONCLUSION
THANK YOU!

More Related Content

Viewers also liked

Text Detection Strategies
Text Detection StrategiesText Detection Strategies
Text Detection StrategiesAnyline
 
Text detection and recognition from natural scenes
Text detection and recognition from natural scenesText detection and recognition from natural scenes
Text detection and recognition from natural sceneshemanthmcqueen
 
Text extraction from natural scene image, a survey
Text extraction from natural scene image, a surveyText extraction from natural scene image, a survey
Text extraction from natural scene image, a surveySOYEON KIM
 
Scene Text Detection on Images using Cellular Automata
Scene Text Detection on Images using Cellular AutomataScene Text Detection on Images using Cellular Automata
Scene Text Detection on Images using Cellular AutomataKonstantinos Zagoris
 
SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...
SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...
SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...Cheriyan K M
 
text detection and reading of product label for blind persons
 text detection and reading of product label for blind persons text detection and reading of product label for blind persons
text detection and reading of product label for blind personsVivek Chamorshikar
 
Detecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformDetecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformPooja G N
 
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSINGBRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSINGDharshika Shreeganesh
 
Automatic handwriting recognition
Automatic handwriting recognitionAutomatic handwriting recognition
Automatic handwriting recognitionBIJIT GHOSH
 
Text extraction From Digital image
Text extraction From Digital imageText extraction From Digital image
Text extraction From Digital imageKaushik Godhani
 
[RakutenTechConf2013] [C4-1] Text detection in product images
[RakutenTechConf2013] [C4-1] Text detection in product images[RakutenTechConf2013] [C4-1] Text detection in product images
[RakutenTechConf2013] [C4-1] Text detection in product imagesRakuten Group, Inc.
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 

Viewers also liked (15)

Text Detection Strategies
Text Detection StrategiesText Detection Strategies
Text Detection Strategies
 
Text detection and recognition from natural scenes
Text detection and recognition from natural scenesText detection and recognition from natural scenes
Text detection and recognition from natural scenes
 
Text extraction from natural scene image, a survey
Text extraction from natural scene image, a surveyText extraction from natural scene image, a survey
Text extraction from natural scene image, a survey
 
Scene Text Detection on Images using Cellular Automata
Scene Text Detection on Images using Cellular AutomataScene Text Detection on Images using Cellular Automata
Scene Text Detection on Images using Cellular Automata
 
SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...
SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...
SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...
 
First Thesis Presentation
First Thesis PresentationFirst Thesis Presentation
First Thesis Presentation
 
text detection and reading of product label for blind persons
 text detection and reading of product label for blind persons text detection and reading of product label for blind persons
text detection and reading of product label for blind persons
 
Detecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformDetecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width Transform
 
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSINGBRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
BRAIN TUMOR MRI IMAGE SEGMENTATION AND DETECTION IN IMAGE PROCESSING
 
Automatic handwriting recognition
Automatic handwriting recognitionAutomatic handwriting recognition
Automatic handwriting recognition
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
Text extraction From Digital image
Text extraction From Digital imageText extraction From Digital image
Text extraction From Digital image
 
[RakutenTechConf2013] [C4-1] Text detection in product images
[RakutenTechConf2013] [C4-1] Text detection in product images[RakutenTechConf2013] [C4-1] Text detection in product images
[RakutenTechConf2013] [C4-1] Text detection in product images
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 

Similar to scene description

Senior Project Paper
Senior Project PaperSenior Project Paper
Senior Project PaperMark Kurtz
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionDarian Frajberg
 
Image annotation - Segmentation & Annotation
Image annotation - Segmentation & AnnotationImage annotation - Segmentation & Annotation
Image annotation - Segmentation & AnnotationTaposh Roy
 
Paper id 24201475
Paper id 24201475Paper id 24201475
Paper id 24201475IJRAT
 
Computer Vision.pdf
Computer Vision.pdfComputer Vision.pdf
Computer Vision.pdfBantuBytes
 
Artificial intelligence Presentation.pptx
Artificial intelligence Presentation.pptxArtificial intelligence Presentation.pptx
Artificial intelligence Presentation.pptxAbdullah al Mamun
 
Lecture 1, 2 - An Introduction ot Computer Vision
Lecture 1, 2 - An Introduction ot Computer VisionLecture 1, 2 - An Introduction ot Computer Vision
Lecture 1, 2 - An Introduction ot Computer VisionAksam Iftikhar
 
E03404025032
E03404025032E03404025032
E03404025032theijes
 
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common SenseDark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common SenseBoston Global Forum
 
ChemnitzDec2014.key.compressed
ChemnitzDec2014.key.compressedChemnitzDec2014.key.compressed
ChemnitzDec2014.key.compressedBrian Fisher
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and NowSi Krishan
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To SentencesIRJET Journal
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision Chen Sagiv
 
Face Recognition using PCA and MSNN
Face Recognition using PCA and MSNNFace Recognition using PCA and MSNN
Face Recognition using PCA and MSNNRABI GAYAN
 
Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfichsan6
 
final ppt -ORIGINAL_Facial_Emotion_Detection special topic -2 review 1-1 (1) ...
final ppt -ORIGINAL_Facial_Emotion_Detection special topic -2 review 1-1 (1) ...final ppt -ORIGINAL_Facial_Emotion_Detection special topic -2 review 1-1 (1) ...
final ppt -ORIGINAL_Facial_Emotion_Detection special topic -2 review 1-1 (1) ...SharaneshUpase1
 

Similar to scene description (20)

Senior Project Paper
Senior Project PaperSenior Project Paper
Senior Project Paper
 
Fame cvpr
Fame cvprFame cvpr
Fame cvpr
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
Image annotation - Segmentation & Annotation
Image annotation - Segmentation & AnnotationImage annotation - Segmentation & Annotation
Image annotation - Segmentation & Annotation
 
Paper id 24201475
Paper id 24201475Paper id 24201475
Paper id 24201475
 
Computer Vision.pdf
Computer Vision.pdfComputer Vision.pdf
Computer Vision.pdf
 
Paper of Final Year Project.pdf
Paper of Final Year Project.pdfPaper of Final Year Project.pdf
Paper of Final Year Project.pdf
 
Artificial intelligence Presentation.pptx
Artificial intelligence Presentation.pptxArtificial intelligence Presentation.pptx
Artificial intelligence Presentation.pptx
 
Lecture 1, 2 - An Introduction ot Computer Vision
Lecture 1, 2 - An Introduction ot Computer VisionLecture 1, 2 - An Introduction ot Computer Vision
Lecture 1, 2 - An Introduction ot Computer Vision
 
E03404025032
E03404025032E03404025032
E03404025032
 
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common SenseDark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense
 
Chemnitz dec2014
Chemnitz dec2014Chemnitz dec2014
Chemnitz dec2014
 
ChemnitzDec2014.key.compressed
ChemnitzDec2014.key.compressedChemnitzDec2014.key.compressed
ChemnitzDec2014.key.compressed
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and Now
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To Sentences
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
AI KIMSRAD.pptx
AI KIMSRAD.pptxAI KIMSRAD.pptx
AI KIMSRAD.pptx
 
Face Recognition using PCA and MSNN
Face Recognition using PCA and MSNNFace Recognition using PCA and MSNN
Face Recognition using PCA and MSNN
 
Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdf
 
final ppt -ORIGINAL_Facial_Emotion_Detection special topic -2 review 1-1 (1) ...
final ppt -ORIGINAL_Facial_Emotion_Detection special topic -2 review 1-1 (1) ...final ppt -ORIGINAL_Facial_Emotion_Detection special topic -2 review 1-1 (1) ...
final ppt -ORIGINAL_Facial_Emotion_Detection special topic -2 review 1-1 (1) ...
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 

scene description

  • 1. FROM IMAGES TO SENTENCES SCENE DESCRIPTION KHUSHALI ACHARYA 1517MECE30008
  • 2. AGENDA 1 Introduction 2 Motivation 3 Related research work 4 Our Approach 5 Conclusion
  • 4. WHAT IS IT? “Interpreting images and generating sentences. ” “Scene interpretation means understanding every- day occurrences or recognizing rare events.” “Scene interpretations are Controlled Hallucinations.”
  • 5. WHY DO WE NEED SUCH SYSTEMS? Isn't a picture enough to depict the things clearly?
  • 6. Ever imagined a cricket match without commentary? Or A movie without dialogues?
  • 7. It has been estimated that more than 80% of the activities we do online are text-based.
  • 8. A laymen can’t understand the medical reports unless the doctor makes him understand or it is in written form. Medical reports
  • 9. Tagged location becomes clear if the place name is described.
  • 10. Thus we saw that the description of an image adds an interestingness measure to it.
  • 12. WHAT ARE THE REAL TIME APPLICATIONS? 2.1
  • 13. • Self-aware cognitive robots • Assist Visually impaired people • Soccer game analysis • Image search/ retrieval systems • Street traffic observations • Criminal act recognition • Agricultural sector
  • 14. Some Applications of scene description
  • 15. HOW IT IS HELPFUL TO SOCIETY? 2.2
  • 16.  Assists Visually Impaired People Screen Reader Screen Readers are software programs that convert text into synthesized speech and blind people are able to listen to web content. LIMITATIONS: • Screen readers cannot describe images. • Screen readers cannot survey the entirety of a web page as a visual user might do. It cannot always intelligently skip over extraneous content, such as advertisements or navigation bars.
  • 17. Scene description can be helpful to blind people in this manner.
  • 18. Images are captured and unusual activities are recorded. The features of images are extracted which thus help in crime investigation.  Criminal Act Recognition
  • 19. Efficient and consistent scene interpretation is a prerequisite self aware cognitive robots to work.  Human Computer Interaction Object Recognition and scene interpretation Spatial Relation Extraction
  • 22. Sr. No. PAPER PROPOSED DATASET CONCLUSION 1. “Midge: Generating image descriptions from computer vision detection” U. of Aberdeen and Oregon Health and Science University, Stony Brook University, U. of Maryland, Columbia University, U. of Washington, MIT. This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. For training:700,000 (Flickr, 2011) images with associated descriptions from the dataset in Ordonez et al. (2011). For evaluation:840 PASCAL images. Midge generates a well- formed description of an image by filtering attribute detections that are unlikely and placing objects into an ordered syntactic structure. 2. “Every picture tells a story: Generating sentences from images.” Farhadi, A., Hejrati, S. M. M., Sadeghi, M. A., Young, P., Rashtchian, C., Hockenmaier, J., and Forsyth, D. A. (2010). Springer. attempts to “generate” sentences by first learning from a set of human annotated examples, and producing the same sentence if both images and sentence share common properties in terms of their triplets: (Nouns-Verbs-Scenes). PASCAL 2008 images with human annotation Sentences are rich, compact and subtle representations of information. Even so, we can predict good sentences for images that people like. The intermediate meaning representation is one key component in our model as it allows benefiting from distributional semantics.
  • 23. Sr. No. PAPER PROPOSED DATASET CONCLUSION 3. “Babytalk: Understanding and generating simple image descriptions” G Kulkarni, V Premraj, V Ordonez. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. 12, DECEMBER 2013 It uses detector for object scene detection and make quadruplet: (Nouns-Verbs- Scenes-preposition). PASCAL 2008 images Human-forced choice experiments demonstrate the quality of the generated sentences over previous approaches. One key to the success of our system was automatically mining and parsing large text collections to obtain statistical models for visually descriptive language. 4. “Choosing Linguistics over Vision to Describe Images” Ankush Gupta, Yashaswi Verma, C. V. Jawahar International Institute of Information Technology, Hyderabad, India – 500032 Problem of automatically generating human-like descriptions for unseen images, given a collection of images and their corresponding human-generated descriptions. PASCAL dataset They proposed a novel approach for generating relevant, fluent and human-like descriptions for images without relying on any object detectors, classifiers, hand-written rules or heuristics.
  • 25. 1.) Choosing Linguistics over Vision to Describe Images i. Given an unseen image, ii. find K images most similar to it from the training images, and using the phrases extracted from their descriptions iii. generate a ranked list of triples which is then used to compose description for the new image.
  • 26. i. input image ii.) Neighboring images with extracted phrases iii.) Triple section and sentence generation
  • 27. FAILURE SCENARIO A motor racer is speeding through a splash mud. A water cow is grazing along a roadside. An orange fixture is hanging in a messy kitchen.
  • 29. USING OPEN CV , NLP & SVM 4.1
  • 30. OPEN CV • Open source computer vision and machine learning software library. • More than 2500 optimized algorithms. • C++, C, Python, Java and MATLAB interfaces • Supports Windows, Linux, Android and Mac OS
  • 31. NLP(Natural Language Processing) It is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages.
  • 32. SVM(Support Vector Machines) A discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples.
  • 34. Take query image as input Detect objects from query image Corpus data & extract shortest sentences RDF (Resource Description Framework)Pa rser <object1,predicate1,object2> Google image API retrieve top 10 images Match query image and compute score for each retrieval image Highest score images are our matching triplet
  • 36. DATA SET PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning) It provides standardized image data sets for object class recognition Technology JAVA/PYTHON
  • 37. Thus we saw the fundamentals of scene description, its applications, previous work in this field and our approach for designing this system. 5. CONCLUSION