SlideShare a Scribd company logo
Rajesh Shreedhar Bhat & Souradip Chakraborty
Image Captioning using Attention Models
❖ Intro to Image Captioning
❖ Real life use cases of image captioning.
❖ History & Evolution of Image Captioning
❖ Sequence to Sequence Learning in NLP
❖ Show and Tell : Sequence to Sequence Learning in Vision
❖ Intuition of Attention Mechanism from Translation perspective
❖ Show Attend and Tell : Image Captioning with Attention
❖ Beam search in generative framework
❖ Code walk-through in PyTorch
Session Agenda
Introduction to Image Captioning
❖ Image captioning is the task of generating a descriptive and appropriate sentence of a given image.
❖ Caption generation is a challenging AI problem where a textual description must be generated for a
given image.
❖ Two tasks involved:
○ Understand the content of the image.
○ Turn the understanding of the image into words in proper order.
Real Life Use Cases of Image Captioning
❖ Search using image captions.
❖ Getting live captions from CCTV/Surveillance
cameras.
❖ Aid to visually impaired people.
Every Picture Tells a Story: Generating Sentences from Images
Defining Pre-defined
Template of Object-
Action-Scene
Mapping Image and
Sentence to Latent
Meaning Space
Multi-label Markov
Random Field to
predict triplet from
image
https://www.cs.cmu.edu/~afarhadi/papers/sentence.pdf
From Captions to Visual Concepts and Back
Word Detection Stage
Sentence
Generation Stage
Ranking Generated
Sentences
Word Detection Stage
& Multiple Instance
Learning
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Fang_From_Captions_to_2015_CVPR_paper.pdf
E nc ode rs & D e c ode rs in s e que nc e -to-
s e que nc e le a rning
❖ Encoder: processes each word in the input sequence
and compiled information is put into context vector
C.
❖ Context vector C is passed to the decoder.
❖ The decoder also maintains a hidden states that it
passes from one time step to the next.
Sequence-to-Sequence models
❖ The encoder-decoder architecture has received a lot of popularity in solving sequence to sequence problems
in NLP domain.
❖ Example – Language Translation, Chatbots, Text summarization, etc.
Show and Tell: A Neural Image Caption Generator
“Show and Tell: A Neural Image Caption Generator”: https://arxiv.org/pdf/1411.4555.pdf
❖ CNNs can very efficiently encode the abstraction of
images and generate a robust representation.
❖ Robust representation acts as the context/thought
vector to the decoder.
❖ The decoder is a standard LSTM/GRU units same
as that in sequence to sequence architectures.
Attention Mechanism Intuition
Source: https://distill.pub/2016/augmented-rnns/
Sequence to Sequence Model with Attention
Obtaining Context Vector using Attention
Source: http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
Attention is all you need: Brief Overview
Show Attend and Tell: A Neural Image Caption Generator
❖ Inspired by the usage of Attention mechanism in
Language Translation task.
❖ Focus on salient parts of the image while
generating the corresponding word in the
output sentence.
“Show Attend and Tell: A Neural Image Caption Generator”: https://arxiv.org/pdf/1502.03044.pdf
Caption Generation using Beam Search
❖ The greedy approach to choose the word with the highest score and use it to predict the next word is not
optimal.
❖ Reason: the rest of the sequence hinges on that first word you choose.
❖ Beam Search is useful for any language modeling problem because it finds the most optimal sequence.
Beam Size = 2
Code walkthrough - PyTorch
https://github.com/rajesh-bhat/dhs_summit_2019_image_captioning
Thank you!

More Related Content

What's hot

Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
LEE HOSEONG
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
Badruz Nasrin Basri
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
Prudhvi Raj
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
Taegyun Jeon
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
Noura Hussein
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
Sushant Shrivastava
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
Raviraj singh shekhawat
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
Vikas Jain
 
Facial emotion recognition
Facial emotion recognitionFacial emotion recognition
Facial emotion recognition
Rahin Patel
 
Object detection
Object detectionObject detection
Object detection
ROUSHAN RAJ KUMAR
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Brodmann17
 
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIESYOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
IRJET Journal
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
Antonio Rueda-Toicen
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
Human Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningHuman Pose Estimation by Deep Learning
Human Pose Estimation by Deep Learning
Wei Yang
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
Abhishek Sharma
 
3D reconstruction
3D reconstruction3D reconstruction
3D reconstruction
Jorge Leandro, Ph.D.
 
Human Action Recognition
Human Action RecognitionHuman Action Recognition
Human Action Recognition
NAVER Engineering
 

What's hot (20)

Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Facial emotion recognition
Facial emotion recognitionFacial emotion recognition
Facial emotion recognition
 
Object detection
Object detectionObject detection
Object detection
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIESYOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Human Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningHuman Pose Estimation by Deep Learning
Human Pose Estimation by Deep Learning
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
3D reconstruction
3D reconstruction3D reconstruction
3D reconstruction
 
Human Action Recognition
Human Action RecognitionHuman Action Recognition
Human Action Recognition
 

Similar to Image captioning

The Magic of Image processing using Neural Networks
The Magic of Image processing using Neural Networks The Magic of Image processing using Neural Networks
The Magic of Image processing using Neural Networks
SK Reddy
 
Understanding deep learning
Understanding deep learningUnderstanding deep learning
Understanding deep learning
Dr. Stylianos Kampakis
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
Yuriy Guts
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
Paris Open Source Summit
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
Anuj Gupta
 
Attention
AttentionAttention
Attention
SEMINARGROOT
 
Natural Language Description Generation for Image using Deep Learning Archite...
Natural Language Description Generation for Image using Deep Learning Archite...Natural Language Description Generation for Image using Deep Learning Archite...
Natural Language Description Generation for Image using Deep Learning Archite...
ijtsrd
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)
Jaemin Cho
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learning
Dr. Ananth Krishnamoorthy
 
What multimodal foundation models cannot perceive
What multimodal foundation models cannot perceiveWhat multimodal foundation models cannot perceive
What multimodal foundation models cannot perceive
University of Amsterdam
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
MeetupDataScienceRoma
 
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptxSANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SangeetaYadav843179
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
Ankita Jadhao
 
JSR381 Visual Recognition for Java.pdf
JSR381 Visual Recognition for Java.pdfJSR381 Visual Recognition for Java.pdf
JSR381 Visual Recognition for Java.pdf
Zoran Sevarac, PhD
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
Liad Magen
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
Sanghamitra Deb
 
Image Compression Using Hybrid Svd Wdr And Svd Aswdr
Image Compression Using Hybrid Svd Wdr And Svd AswdrImage Compression Using Hybrid Svd Wdr And Svd Aswdr
Image Compression Using Hybrid Svd Wdr And Svd Aswdr
Melanie Smith
 
Face recognition system
Face recognition systemFace recognition system
Face recognition system
ShitanshuRanjanSriva2
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Ali Alkan
 
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET Journal
 

Similar to Image captioning (20)

The Magic of Image processing using Neural Networks
The Magic of Image processing using Neural Networks The Magic of Image processing using Neural Networks
The Magic of Image processing using Neural Networks
 
Understanding deep learning
Understanding deep learningUnderstanding deep learning
Understanding deep learning
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Attention
AttentionAttention
Attention
 
Natural Language Description Generation for Image using Deep Learning Archite...
Natural Language Description Generation for Image using Deep Learning Archite...Natural Language Description Generation for Image using Deep Learning Archite...
Natural Language Description Generation for Image using Deep Learning Archite...
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learning
 
What multimodal foundation models cannot perceive
What multimodal foundation models cannot perceiveWhat multimodal foundation models cannot perceive
What multimodal foundation models cannot perceive
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptxSANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
 
JSR381 Visual Recognition for Java.pdf
JSR381 Visual Recognition for Java.pdfJSR381 Visual Recognition for Java.pdf
JSR381 Visual Recognition for Java.pdf
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Image Compression Using Hybrid Svd Wdr And Svd Aswdr
Image Compression Using Hybrid Svd Wdr And Svd AswdrImage Compression Using Hybrid Svd Wdr And Svd Aswdr
Image Compression Using Hybrid Svd Wdr And Svd Aswdr
 
Face recognition system
Face recognition systemFace recognition system
Face recognition system
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
 

Recently uploaded

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 

Recently uploaded (20)

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 

Image captioning

  • 1. Rajesh Shreedhar Bhat & Souradip Chakraborty Image Captioning using Attention Models
  • 2. ❖ Intro to Image Captioning ❖ Real life use cases of image captioning. ❖ History & Evolution of Image Captioning ❖ Sequence to Sequence Learning in NLP ❖ Show and Tell : Sequence to Sequence Learning in Vision ❖ Intuition of Attention Mechanism from Translation perspective ❖ Show Attend and Tell : Image Captioning with Attention ❖ Beam search in generative framework ❖ Code walk-through in PyTorch Session Agenda
  • 3. Introduction to Image Captioning ❖ Image captioning is the task of generating a descriptive and appropriate sentence of a given image. ❖ Caption generation is a challenging AI problem where a textual description must be generated for a given image. ❖ Two tasks involved: ○ Understand the content of the image. ○ Turn the understanding of the image into words in proper order.
  • 4. Real Life Use Cases of Image Captioning ❖ Search using image captions. ❖ Getting live captions from CCTV/Surveillance cameras. ❖ Aid to visually impaired people.
  • 5. Every Picture Tells a Story: Generating Sentences from Images Defining Pre-defined Template of Object- Action-Scene Mapping Image and Sentence to Latent Meaning Space Multi-label Markov Random Field to predict triplet from image https://www.cs.cmu.edu/~afarhadi/papers/sentence.pdf
  • 6. From Captions to Visual Concepts and Back Word Detection Stage Sentence Generation Stage Ranking Generated Sentences Word Detection Stage & Multiple Instance Learning https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Fang_From_Captions_to_2015_CVPR_paper.pdf
  • 7. E nc ode rs & D e c ode rs in s e que nc e -to- s e que nc e le a rning ❖ Encoder: processes each word in the input sequence and compiled information is put into context vector C. ❖ Context vector C is passed to the decoder. ❖ The decoder also maintains a hidden states that it passes from one time step to the next. Sequence-to-Sequence models ❖ The encoder-decoder architecture has received a lot of popularity in solving sequence to sequence problems in NLP domain. ❖ Example – Language Translation, Chatbots, Text summarization, etc.
  • 8. Show and Tell: A Neural Image Caption Generator “Show and Tell: A Neural Image Caption Generator”: https://arxiv.org/pdf/1411.4555.pdf ❖ CNNs can very efficiently encode the abstraction of images and generate a robust representation. ❖ Robust representation acts as the context/thought vector to the decoder. ❖ The decoder is a standard LSTM/GRU units same as that in sequence to sequence architectures.
  • 9. Attention Mechanism Intuition Source: https://distill.pub/2016/augmented-rnns/
  • 10. Sequence to Sequence Model with Attention
  • 11. Obtaining Context Vector using Attention Source: http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
  • 12. Attention is all you need: Brief Overview
  • 13. Show Attend and Tell: A Neural Image Caption Generator ❖ Inspired by the usage of Attention mechanism in Language Translation task. ❖ Focus on salient parts of the image while generating the corresponding word in the output sentence. “Show Attend and Tell: A Neural Image Caption Generator”: https://arxiv.org/pdf/1502.03044.pdf
  • 14. Caption Generation using Beam Search ❖ The greedy approach to choose the word with the highest score and use it to predict the next word is not optimal. ❖ Reason: the rest of the sequence hinges on that first word you choose. ❖ Beam Search is useful for any language modeling problem because it finds the most optimal sequence. Beam Size = 2
  • 15. Code walkthrough - PyTorch https://github.com/rajesh-bhat/dhs_summit_2019_image_captioning