SlideShare a Scribd company logo
1 of 27
Download to read offline
Show, Observe and Tell: Attribute-driven Attention
Model for Image Captioning
Hui Chen, Guiguang Ding, Zijia Lin, Sicheng Zhao, Jungong Han
IJCAI - 2018
INTRODUCTION
• Given an image, produce a sentence describing its
contents
• Inputs: An image
• Outputs: Multiple words (let’s consider one sentence)
: The dog is hiding
Introduction to Image Captioning
RNN
CNN
Introduction to Image Captioning
RNN
CNN
RNN
h2h1
h2
Linear
Classifier
The
Introduction to Image Captioning
RNN
CNN
RNN RNN
h2 h3h1
The dog
h2 h3
Linear
Classifier
Linear
Classifier
Introduction to Image Captioning
RNN
CNN
RNN RNN
h2 h3h1
The dog
h2 h3
Linear
Classifier
Linear
Classifier
Motivation
Static
representation
irrespective
redundant
Ideas of paper
• Objective
• More compact representations should be explored to gain better attention
accuracy
• Solutions
• A CNN-RNN framework with the
attention mechanism to predict attributes
• Model co-occurrence dependencies
among attributes. For example,
object terms, women and umbrella, can
help to recognize the relational term, under
Related Work
Attribute-based method
1) Image captioning with semantic attention (You et al. 2016)
2) Boosting image captioning with attributes (Yao et al. 2017)
3) Semantic regularization for recurrent image annotation (Liu et al/ 2017)
Attributes are mainly
predicted by a CNN
Can not model co-occurrence
dependency of attributes
Visual attention method
Many visual attention schemes are introduced recently:
1) Show, attend and tell (Xu et al. 2015) – soft + hard attention
2) SCA-CNN: Spatial and channel-wise attention in convolutional networks
for image captioning (Chen et al. 2017) – channel attention
3) Bottom-up and top-down attention for image captioning and VQA
(Anderson et al. 2017) – bottom up + top down attention
Methodology
Overview architecture
• Encoder – Decoder paradigm
Attention module
• Region-based feature
• Attribute-based
• Attend to a series of context features to acquire attribute-based features c*
t
soft-attention over regions to get
context feature v*
t
Inference module
• Capture co-occurrence dependencies among attributes
in which v*
t is context features from
attention module
Binary vector Is where 1 means
that the image has the
corresponding attribute,
and 0 means not.
inference layer generates the distribution of the next
attribute to be predicted via a soft-max function
Generation module
• Generate a sentence word by word
• Is acts as a semantic regularization
force the RNN to understand
the image in the beginning
Experiments
Dataset
• MS COCO 2014 dataset - 123287 images - 5 captions per image
• 5000 for validation
• 5000 for test
• Remaining for training
Evaluation metrics
• BLEU
• ROUGE-L
• METEOR
• CIDEr
Compared approaches
Ablation analysis
• The second row shows that the proposed approach to predict attributes
is effective
Qualitative analysis
• Detect object terms, relational terms and descriptive terms
• New attributes can be explored
Conclusion
Take-home messages
• This paper introduces an attribute detection mechanism using visual
attention with inferring attribute dependencies allowing more accurate
attention for image captioning. RNN is effective in modeling
dependences.
• Sematic regularization (forcing binary vector) could be popularized in
image captioning to gain a better performance.
• They conduct extensive experiments to demonstrate the effectiveness of
the proposed framework.
• Detailed analysis (ablation and qualitative analysis) are provided for
further insights and research in this domain.
Quizzes
1) What kind of attention mechanisms are being used in this paper?
• Region-based attention from feature map
• Attribute-based attention from context features
2) What is the size of Is?
• The size of Is is determined by the attributes that we have obtained from
groud-truth captions. This vector depicts object/relational/descriptive terms
Discussion
1) Attributes could be ameliorated like Neural Baby Talk (cat -> tabby,
kitten, feline, …)
2) Besides beam search, teacher forcing could be applied to speed up
training and may get better performance.
3) The metrics in this paper are weak although they are written in 2018,
BLEU, ROUGE and CIDEr are mostly based-on overlapping calculations
between prediction and ground-truth. More advanced metrics like SPICE
should be also validated to assure the feasibility of this approach.
Thank you for your attention!

More Related Content

What's hot

[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
 
Test PDF
Test PDFTest PDF
Test PDFAlgnuD
 
Learning to learn unlearned feature for segmentation
Learning to learn unlearned feature for segmentationLearning to learn unlearned feature for segmentation
Learning to learn unlearned feature for segmentationNAVER Engineering
 
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...Sri Ambati
 
The Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningThe Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningYoonho Lee
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...MLAI2
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningDaniel Emaasit
 
Modular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy SketchesModular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy SketchesYoonho Lee
 
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceYoonho Lee
 
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense ReasoningKagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense ReasoningKorea University
 
Human-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital PathologyHuman-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital PathologyMara Graziani
 
From deep learning to deep reasoning
From deep learning to deep reasoningFrom deep learning to deep reasoning
From deep learning to deep reasoningDeakin University
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using KerasIRJET Journal
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411Clay Stanek
 
Rsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceRsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceAlessya Visnjic
 

What's hot (19)

[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
Test PDF
Test PDFTest PDF
Test PDF
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Learning to learn unlearned feature for segmentation
Learning to learn unlearned feature for segmentationLearning to learn unlearned feature for segmentation
Learning to learn unlearned feature for segmentation
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
 
The Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningThe Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and Planning
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine Learning
 
Modular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy SketchesModular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy Sketches
 
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
 
Open-ended Visual Question-Answering
Open-ended  Visual Question-AnsweringOpen-ended  Visual Question-Answering
Open-ended Visual Question-Answering
 
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense ReasoningKagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
 
Human-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital PathologyHuman-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital Pathology
 
From deep learning to deep reasoning
From deep learning to deep reasoningFrom deep learning to deep reasoning
From deep learning to deep reasoning
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411
 
Rsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceRsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature Importance
 

Similar to Show observe and tell giang nguyen

Saliency-based Models of Image Content and their Application to Auto-Annotati...
Saliency-based Models of Image Content and their Application to Auto-Annotati...Saliency-based Models of Image Content and their Application to Auto-Annotati...
Saliency-based Models of Image Content and their Application to Auto-Annotati...Jonathon Hare
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsNYC Predictive Analytics
 
A Graph-based Web Image Annotation for Large Scale Image Retrieval
A Graph-based Web Image Annotation for Large Scale Image RetrievalA Graph-based Web Image Annotation for Large Scale Image Retrieval
A Graph-based Web Image Annotation for Large Scale Image RetrievalIRJET Journal
 
How well do self-supervised models transfer.pptx
How well do self-supervised models transfer.pptxHow well do self-supervised models transfer.pptx
How well do self-supervised models transfer.pptxssuserbafbd0
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptxSaravanaD2
 
10.1.1.432.9149.pdf
10.1.1.432.9149.pdf10.1.1.432.9149.pdf
10.1.1.432.9149.pdfmoemi1
 
https://uii.io/0hIB
https://uii.io/0hIBhttps://uii.io/0hIB
https://uii.io/0hIBmoemi1
 
10.1.1.432.9149
10.1.1.432.914910.1.1.432.9149
10.1.1.432.9149moemi1
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
A Study on Image Retrieval Features and Techniques with Various Combinations
A Study on Image Retrieval Features and Techniques with Various CombinationsA Study on Image Retrieval Features and Techniques with Various Combinations
A Study on Image Retrieval Features and Techniques with Various CombinationsIRJET Journal
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.bhavinecindus
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...dbpublications
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGNathan Mathis
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
 
OBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEYOBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEYJournal For Research
 
An Impact on Content Based Image Retrival A Perspective View
An Impact on Content Based Image Retrival A Perspective ViewAn Impact on Content Based Image Retrival A Perspective View
An Impact on Content Based Image Retrival A Perspective Viewijtsrd
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMIRJET Journal
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonJonathon Hare
 
Moving object detection in complex scene
Moving object detection in complex sceneMoving object detection in complex scene
Moving object detection in complex sceneKumar Mayank
 

Similar to Show observe and tell giang nguyen (20)

Saliency-based Models of Image Content and their Application to Auto-Annotati...
Saliency-based Models of Image Content and their Application to Auto-Annotati...Saliency-based Models of Image Content and their Application to Auto-Annotati...
Saliency-based Models of Image Content and their Application to Auto-Annotati...
 
CBIR with RF
CBIR with RFCBIR with RF
CBIR with RF
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
A Graph-based Web Image Annotation for Large Scale Image Retrieval
A Graph-based Web Image Annotation for Large Scale Image RetrievalA Graph-based Web Image Annotation for Large Scale Image Retrieval
A Graph-based Web Image Annotation for Large Scale Image Retrieval
 
How well do self-supervised models transfer.pptx
How well do self-supervised models transfer.pptxHow well do self-supervised models transfer.pptx
How well do self-supervised models transfer.pptx
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
 
10.1.1.432.9149.pdf
10.1.1.432.9149.pdf10.1.1.432.9149.pdf
10.1.1.432.9149.pdf
 
https://uii.io/0hIB
https://uii.io/0hIBhttps://uii.io/0hIB
https://uii.io/0hIB
 
10.1.1.432.9149
10.1.1.432.914910.1.1.432.9149
10.1.1.432.9149
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
A Study on Image Retrieval Features and Techniques with Various Combinations
A Study on Image Retrieval Features and Techniques with Various CombinationsA Study on Image Retrieval Features and Techniques with Various Combinations
A Study on Image Retrieval Features and Techniques with Various Combinations
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
OBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEYOBJECT DETECTION AND RECOGNITION: A SURVEY
OBJECT DETECTION AND RECOGNITION: A SURVEY
 
An Impact on Content Based Image Retrival A Perspective View
An Impact on Content Based Image Retrival A Perspective ViewAn Impact on Content Based Image Retrival A Perspective View
An Impact on Content Based Image Retrival A Perspective View
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Moving object detection in complex scene
Moving object detection in complex sceneMoving object detection in complex scene
Moving object detection in complex scene
 

More from Nguyen Giang

Introduction to continual learning
Introduction to continual learningIntroduction to continual learning
Introduction to continual learningNguyen Giang
 
Variational continual learning
Variational continual learningVariational continual learning
Variational continual learningNguyen Giang
 
Survey on Script-based languages to write a Chatbot
Survey on Script-based languages to write a ChatbotSurvey on Script-based languages to write a Chatbot
Survey on Script-based languages to write a ChatbotNguyen Giang
 
How Tala works in credit score
How Tala works in credit scoreHow Tala works in credit score
How Tala works in credit scoreNguyen Giang
 
Virtual assistant with amazon alexa
Virtual assistant with amazon alexaVirtual assistant with amazon alexa
Virtual assistant with amazon alexaNguyen Giang
 
ECG Detector deployed based on OPENMSP430 open-core
ECG Detector deployed based on OPENMSP430 open-coreECG Detector deployed based on OPENMSP430 open-core
ECG Detector deployed based on OPENMSP430 open-coreNguyen Giang
 

More from Nguyen Giang (8)

Introduction to continual learning
Introduction to continual learningIntroduction to continual learning
Introduction to continual learning
 
Variational continual learning
Variational continual learningVariational continual learning
Variational continual learning
 
Scalability fs v2
Scalability fs v2Scalability fs v2
Scalability fs v2
 
Survey on Script-based languages to write a Chatbot
Survey on Script-based languages to write a ChatbotSurvey on Script-based languages to write a Chatbot
Survey on Script-based languages to write a Chatbot
 
How Tala works in credit score
How Tala works in credit scoreHow Tala works in credit score
How Tala works in credit score
 
Virtual assistant with amazon alexa
Virtual assistant with amazon alexaVirtual assistant with amazon alexa
Virtual assistant with amazon alexa
 
AIML Introduction
AIML IntroductionAIML Introduction
AIML Introduction
 
ECG Detector deployed based on OPENMSP430 open-core
ECG Detector deployed based on OPENMSP430 open-coreECG Detector deployed based on OPENMSP430 open-core
ECG Detector deployed based on OPENMSP430 open-core
 

Recently uploaded

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 

Recently uploaded (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 

Show observe and tell giang nguyen

  • 1. Show, Observe and Tell: Attribute-driven Attention Model for Image Captioning Hui Chen, Guiguang Ding, Zijia Lin, Sicheng Zhao, Jungong Han IJCAI - 2018
  • 3. • Given an image, produce a sentence describing its contents • Inputs: An image • Outputs: Multiple words (let’s consider one sentence) : The dog is hiding Introduction to Image Captioning
  • 6. RNN CNN RNN RNN h2 h3h1 The dog h2 h3 Linear Classifier Linear Classifier Introduction to Image Captioning
  • 7. RNN CNN RNN RNN h2 h3h1 The dog h2 h3 Linear Classifier Linear Classifier Motivation Static representation irrespective redundant
  • 8. Ideas of paper • Objective • More compact representations should be explored to gain better attention accuracy • Solutions • A CNN-RNN framework with the attention mechanism to predict attributes • Model co-occurrence dependencies among attributes. For example, object terms, women and umbrella, can help to recognize the relational term, under
  • 10. Attribute-based method 1) Image captioning with semantic attention (You et al. 2016) 2) Boosting image captioning with attributes (Yao et al. 2017) 3) Semantic regularization for recurrent image annotation (Liu et al/ 2017) Attributes are mainly predicted by a CNN Can not model co-occurrence dependency of attributes
  • 11. Visual attention method Many visual attention schemes are introduced recently: 1) Show, attend and tell (Xu et al. 2015) – soft + hard attention 2) SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning (Chen et al. 2017) – channel attention 3) Bottom-up and top-down attention for image captioning and VQA (Anderson et al. 2017) – bottom up + top down attention
  • 13. Overview architecture • Encoder – Decoder paradigm
  • 14. Attention module • Region-based feature • Attribute-based • Attend to a series of context features to acquire attribute-based features c* t soft-attention over regions to get context feature v* t
  • 15. Inference module • Capture co-occurrence dependencies among attributes in which v* t is context features from attention module Binary vector Is where 1 means that the image has the corresponding attribute, and 0 means not. inference layer generates the distribution of the next attribute to be predicted via a soft-max function
  • 16. Generation module • Generate a sentence word by word • Is acts as a semantic regularization force the RNN to understand the image in the beginning
  • 18. Dataset • MS COCO 2014 dataset - 123287 images - 5 captions per image • 5000 for validation • 5000 for test • Remaining for training
  • 19. Evaluation metrics • BLEU • ROUGE-L • METEOR • CIDEr
  • 21. Ablation analysis • The second row shows that the proposed approach to predict attributes is effective
  • 22. Qualitative analysis • Detect object terms, relational terms and descriptive terms • New attributes can be explored
  • 24. Take-home messages • This paper introduces an attribute detection mechanism using visual attention with inferring attribute dependencies allowing more accurate attention for image captioning. RNN is effective in modeling dependences. • Sematic regularization (forcing binary vector) could be popularized in image captioning to gain a better performance. • They conduct extensive experiments to demonstrate the effectiveness of the proposed framework. • Detailed analysis (ablation and qualitative analysis) are provided for further insights and research in this domain.
  • 25. Quizzes 1) What kind of attention mechanisms are being used in this paper? • Region-based attention from feature map • Attribute-based attention from context features 2) What is the size of Is? • The size of Is is determined by the attributes that we have obtained from groud-truth captions. This vector depicts object/relational/descriptive terms
  • 26. Discussion 1) Attributes could be ameliorated like Neural Baby Talk (cat -> tabby, kitten, feline, …) 2) Besides beam search, teacher forcing could be applied to speed up training and may get better performance. 3) The metrics in this paper are weak although they are written in 2018, BLEU, ROUGE and CIDEr are mostly based-on overlapping calculations between prediction and ground-truth. More advanced metrics like SPICE should be also validated to assure the feasibility of this approach.
  • 27. Thank you for your attention!