SlideShare a Scribd company logo
Visual Reference Resolution using
Attention Memory for Visual Dialog
Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal
{hsseo, bhhan}@postech.ac.kr {andreas.lehrmann, lsigal}@disneyresearch.com
NIPS 2017
Visual Dialog
• Similar to Visual Question Answering (VQA)
• Answering sequential questions grounded to a visual input.
• Ambiguous expressions due to previous references.
• Attention models dominate VQA
Q: How many people are on wheelchairs?
A: Two.
Q: What are their genders?
A: One male and one female.
Q: Which one is holding a racket?
A: The woman.
A: Two.
Q: What are their genders?
A: One male and one female.
Q: Which one is holding a racket?
A: The woman.
Ambiguous expressions due to earlier question referring.
Motivation
§ Attention models:
Dominant approach for answering independent questions (VQA).
§ Ambiguous expressions:
Ambiguous visual reference without understanding dialog contexts.
Q: What is sitting on the handle bar of a bicycle? Q: What are their colors?
?
A: Two.
Q: What are their genders?
A: One male and one female.
Q: Which one is holding a racket?
A: The woman.
Ambiguous expressions due to earlier question referring.
Motivation
§ Attention models:
Dominant approach for answering independent questions (VQA).
§ Ambiguous expressions:
Ambiguous visual reference without understanding dialog contexts.
Q: What is sitting on the handle bar of a bicycle? Q: What are their colors?
?
2
Attention-based Image Encoding
• Extracting features from relevant regions of image by
attention based on input question (𝑞) and history (𝐻).
3
Visual Reference Resolution
• Obtain tentative attention 𝜶 𝑡
tent
and retrieved attention 𝜶 𝑡
mem
using
• Question embedding 𝒄 𝑡
• Image feature map 𝒇
• Memory 𝑴 𝑡 = {(𝜶 𝜏, 𝒌 𝜏)|0 ≤ 𝜏 ≤ 𝑡 − 1}
…
…
attention retrieval
𝜶 𝜏 𝒌 𝜏
𝒄 𝑡
𝜶 𝑡
mem
𝒌 𝑡
mem
tentative attention
𝒇
𝜶 𝑡
tent
𝑴 𝑡
4
Visual Reference Resolution
• Final attention 𝜶 𝑡 is computed by dynamically combining
• Tentative attention 𝜶 𝑡
tent
• Retrieved attention 𝜶 𝑡
mem
…
…
attention retrieval
𝜶 𝜏 𝒌 𝜏
𝒄 𝑡
𝜶 𝑡
mem
𝒌 𝑡
mem
tentative attention
𝒇
𝜶 𝑡
tent
𝑾DPL
𝒄 𝑡
𝜶 𝑡
dynamic combination
𝒄 𝑡
𝑴 𝑡
5
Tentative Attention Computation
• Attention score is computed
at every spatial location.
• By inner-product of image
feature and question
embedding
• As in many VQA models
…
…
attention retrieval
𝜶 𝜏 𝒌 𝜏
𝒄 𝑡
𝜶 𝑡
mem
𝒌 𝑡
mem
tentative attention
𝒇
𝜶 𝑡
tent
𝑾DPL
𝒄 𝑡
𝜶 𝑡
dynamic combination
𝒄 𝑡
𝑴 𝑡
6
• Retrieval from key-value
memory
• Sequential preference
Attention Retrieval from memory
…
𝒌0
𝒌1
𝒌 𝑡−1
…
…
𝜶0
𝜶1
𝜶 𝑡−1
𝜷 𝑡
∑ ∑
𝑾mem
𝒄 𝑡
𝜶 𝑡
mem
𝒌 𝑡
mem
keysattentions
…
…
attention retrieval
𝜶 𝜏 𝒌 𝜏
𝒄 𝑡
𝜶 𝑡
mem
𝒌 𝑡
mem
tentative attention
𝒇
𝜶 𝑡
tent
𝑾DPL
𝒄 𝑡
𝜶 𝑡
dynamic combination
𝒄 𝑡
𝑴 𝑡
and
⇒
7
Dynamic Combination of Attentions
• Local merge using a
convolution layer.
• Global merge using
dynamic parameter layer
from (Noh et al., 2016).
• Weights 𝑾DPL
(𝒄 𝑡) are
dynamically predicted
from question
embedding.
…
…
attention retrieval
𝜶 𝜏 𝒌 𝜏
𝒄 𝑡
𝜶 𝑡
mem
𝒌 𝑡
mem
tentative attention
𝒇
𝜶 𝑡
tent
𝑾DPL
𝒄 𝑡
𝜶 𝑡
dynamic combination
𝒄 𝑡
𝑴 𝑡
8
MNIST Dialog
• Synthetic visual dialog dataset that highlights model’s
ability resolving visual references.
• Designed to contain ambiguous expressions and strong inter-
dependency among questions in a dialog.
9
Results on MNIST Dialog
• Attention based models perform better.
• The proposed model outperforms baselines.
• Use of memory component stabilizes the performance at
later steps of dialog.
10
Qualitative Result on MNIST Dialog
• Semantically reasonable use of retrieved attention from
attention memory.
11
Parameter Analysis
• Memory addressing coefficients with/without sequential
preference.
• t-SNE plots of dynamically predicted weights.
12
Results on Visual Dialog
• Proposed model outperforms previous works with a fewer
number of parameters.
13
Conclusion
• We proposed a novel method for visual dialog resolving
visual references through attention memory.
• The proposed method can retrieve the attention of
ambiguous expressions from the memory to resolve the
reference of the current target.
• Memory retrieval process allows the model to obtain
semantically reasonable attention from attention history.
• Our model shows the state-of-the-art performance on both
synthetic and real datasets.
14
Visual Reference Resolution using
Attention Memory for Visual Dialog
Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal
{hsseo, bhhan}@postech.ac.kr {andreas.lehrmann, lsigal}@disneyresearch.com
NIPS 2017
Visit http://cvlab.postech.ac.kr/research/attmem/ for more information

More Related Content

Similar to Visual reference resolution using attention memory for visual dialog (NIPS 2017)

Mind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing DaysMind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing Days
Huib Schoots
 
Memory assessment
Memory assessmentMemory assessment
Memory assessment
Zahiruddin Othman
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation
TAGNN: Target Attentive Graph Neural Networks for Session-based RecommendationTAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation
TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation
Fatemeh Ghorbani
 
Image caption generation L18_CNN_RNN_2.pptx
Image caption generation L18_CNN_RNN_2.pptxImage caption generation L18_CNN_RNN_2.pptx
Image caption generation L18_CNN_RNN_2.pptx
erharshkumarroy
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
ManojKumar Rangasamy Kannadasan
 
Making Sense of (Big) Bata with Visual Analytics
Making Sense of (Big) Bata with Visual AnalyticsMaking Sense of (Big) Bata with Visual Analytics
Making Sense of (Big) Bata with Visual Analytics
Kai Xu
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
Shuai Zhang
 
Biometric Recognition using Deep Learning
Biometric Recognition using Deep LearningBiometric Recognition using Deep Learning
Biometric Recognition using Deep Learning
SahithiKotha2
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
NAVER Engineering
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
QuantUniversity
 
Attention scores and mechanisms
Attention scores and mechanismsAttention scores and mechanisms
Attention scores and mechanisms
JaeHo Jang
 
ACM ICTIR 2019 Slides - Santa Clara, USA
ACM ICTIR 2019 Slides -  Santa Clara, USAACM ICTIR 2019 Slides -  Santa Clara, USA
ACM ICTIR 2019 Slides - Santa Clara, USA
Iadh Ounis
 
Tracking emerges by colorizing videos
Tracking emerges by colorizing videosTracking emerges by colorizing videos
Tracking emerges by colorizing videos
Oh Yoojin
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Basic Engineering Design (Part 7): Presenting the Solution
Basic Engineering Design (Part 7): Presenting the SolutionBasic Engineering Design (Part 7): Presenting the Solution
Basic Engineering Design (Part 7): Presenting the Solution
Denise Wilson
 
ReviewAnalysis MLconf 2016 JPrendki
ReviewAnalysis MLconf 2016 JPrendkiReviewAnalysis MLconf 2016 JPrendki
ReviewAnalysis MLconf 2016 JPrendki
Jennifer Prendki
 
When Mobile meets UX/UI powered by Growth Hacking Asia
When Mobile meets UX/UI powered by Growth Hacking AsiaWhen Mobile meets UX/UI powered by Growth Hacking Asia
When Mobile meets UX/UI powered by Growth Hacking Asia
Growth Hacking Asia
 
Visual Attention
Visual Attention Visual Attention
Visual Attention
Sefat Chowdhury
 

Similar to Visual reference resolution using attention memory for visual dialog (NIPS 2017) (20)

Mind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing DaysMind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing Days
 
Memory assessment
Memory assessmentMemory assessment
Memory assessment
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation
TAGNN: Target Attentive Graph Neural Networks for Session-based RecommendationTAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation
TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation
 
Image caption generation L18_CNN_RNN_2.pptx
Image caption generation L18_CNN_RNN_2.pptxImage caption generation L18_CNN_RNN_2.pptx
Image caption generation L18_CNN_RNN_2.pptx
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
 
Making Sense of (Big) Bata with Visual Analytics
Making Sense of (Big) Bata with Visual AnalyticsMaking Sense of (Big) Bata with Visual Analytics
Making Sense of (Big) Bata with Visual Analytics
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Biometric Recognition using Deep Learning
Biometric Recognition using Deep LearningBiometric Recognition using Deep Learning
Biometric Recognition using Deep Learning
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
Attention scores and mechanisms
Attention scores and mechanismsAttention scores and mechanisms
Attention scores and mechanisms
 
ACM ICTIR 2019 Slides - Santa Clara, USA
ACM ICTIR 2019 Slides -  Santa Clara, USAACM ICTIR 2019 Slides -  Santa Clara, USA
ACM ICTIR 2019 Slides - Santa Clara, USA
 
Tracking emerges by colorizing videos
Tracking emerges by colorizing videosTracking emerges by colorizing videos
Tracking emerges by colorizing videos
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Basic Engineering Design (Part 7): Presenting the Solution
Basic Engineering Design (Part 7): Presenting the SolutionBasic Engineering Design (Part 7): Presenting the Solution
Basic Engineering Design (Part 7): Presenting the Solution
 
ReviewAnalysis MLconf 2016 JPrendki
ReviewAnalysis MLconf 2016 JPrendkiReviewAnalysis MLconf 2016 JPrendki
ReviewAnalysis MLconf 2016 JPrendki
 
When Mobile meets UX/UI powered by Growth Hacking Asia
When Mobile meets UX/UI powered by Growth Hacking AsiaWhen Mobile meets UX/UI powered by Growth Hacking Asia
When Mobile meets UX/UI powered by Growth Hacking Asia
 
Visual Attention
Visual Attention Visual Attention
Visual Attention
 

More from NAVER Engineering

React vac pattern
React vac patternReact vac pattern
React vac pattern
NAVER Engineering
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
NAVER Engineering
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
NAVER Engineering
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
NAVER Engineering
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
NAVER Engineering
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
NAVER Engineering
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
NAVER Engineering
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
NAVER Engineering
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
NAVER Engineering
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
NAVER Engineering
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
NAVER Engineering
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
NAVER Engineering
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
NAVER Engineering
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
NAVER Engineering
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
NAVER Engineering
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
NAVER Engineering
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
NAVER Engineering
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
NAVER Engineering
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
NAVER Engineering
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
NAVER Engineering
 

More from NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 

Visual reference resolution using attention memory for visual dialog (NIPS 2017)

  • 1. Visual Reference Resolution using Attention Memory for Visual Dialog Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal {hsseo, bhhan}@postech.ac.kr {andreas.lehrmann, lsigal}@disneyresearch.com NIPS 2017
  • 2. Visual Dialog • Similar to Visual Question Answering (VQA) • Answering sequential questions grounded to a visual input. • Ambiguous expressions due to previous references. • Attention models dominate VQA Q: How many people are on wheelchairs? A: Two. Q: What are their genders? A: One male and one female. Q: Which one is holding a racket? A: The woman. A: Two. Q: What are their genders? A: One male and one female. Q: Which one is holding a racket? A: The woman. Ambiguous expressions due to earlier question referring. Motivation § Attention models: Dominant approach for answering independent questions (VQA). § Ambiguous expressions: Ambiguous visual reference without understanding dialog contexts. Q: What is sitting on the handle bar of a bicycle? Q: What are their colors? ? A: Two. Q: What are their genders? A: One male and one female. Q: Which one is holding a racket? A: The woman. Ambiguous expressions due to earlier question referring. Motivation § Attention models: Dominant approach for answering independent questions (VQA). § Ambiguous expressions: Ambiguous visual reference without understanding dialog contexts. Q: What is sitting on the handle bar of a bicycle? Q: What are their colors? ? 2
  • 3. Attention-based Image Encoding • Extracting features from relevant regions of image by attention based on input question (𝑞) and history (𝐻). 3
  • 4. Visual Reference Resolution • Obtain tentative attention 𝜶 𝑡 tent and retrieved attention 𝜶 𝑡 mem using • Question embedding 𝒄 𝑡 • Image feature map 𝒇 • Memory 𝑴 𝑡 = {(𝜶 𝜏, 𝒌 𝜏)|0 ≤ 𝜏 ≤ 𝑡 − 1} … … attention retrieval 𝜶 𝜏 𝒌 𝜏 𝒄 𝑡 𝜶 𝑡 mem 𝒌 𝑡 mem tentative attention 𝒇 𝜶 𝑡 tent 𝑴 𝑡 4
  • 5. Visual Reference Resolution • Final attention 𝜶 𝑡 is computed by dynamically combining • Tentative attention 𝜶 𝑡 tent • Retrieved attention 𝜶 𝑡 mem … … attention retrieval 𝜶 𝜏 𝒌 𝜏 𝒄 𝑡 𝜶 𝑡 mem 𝒌 𝑡 mem tentative attention 𝒇 𝜶 𝑡 tent 𝑾DPL 𝒄 𝑡 𝜶 𝑡 dynamic combination 𝒄 𝑡 𝑴 𝑡 5
  • 6. Tentative Attention Computation • Attention score is computed at every spatial location. • By inner-product of image feature and question embedding • As in many VQA models … … attention retrieval 𝜶 𝜏 𝒌 𝜏 𝒄 𝑡 𝜶 𝑡 mem 𝒌 𝑡 mem tentative attention 𝒇 𝜶 𝑡 tent 𝑾DPL 𝒄 𝑡 𝜶 𝑡 dynamic combination 𝒄 𝑡 𝑴 𝑡 6
  • 7. • Retrieval from key-value memory • Sequential preference Attention Retrieval from memory … 𝒌0 𝒌1 𝒌 𝑡−1 … … 𝜶0 𝜶1 𝜶 𝑡−1 𝜷 𝑡 ∑ ∑ 𝑾mem 𝒄 𝑡 𝜶 𝑡 mem 𝒌 𝑡 mem keysattentions … … attention retrieval 𝜶 𝜏 𝒌 𝜏 𝒄 𝑡 𝜶 𝑡 mem 𝒌 𝑡 mem tentative attention 𝒇 𝜶 𝑡 tent 𝑾DPL 𝒄 𝑡 𝜶 𝑡 dynamic combination 𝒄 𝑡 𝑴 𝑡 and ⇒ 7
  • 8. Dynamic Combination of Attentions • Local merge using a convolution layer. • Global merge using dynamic parameter layer from (Noh et al., 2016). • Weights 𝑾DPL (𝒄 𝑡) are dynamically predicted from question embedding. … … attention retrieval 𝜶 𝜏 𝒌 𝜏 𝒄 𝑡 𝜶 𝑡 mem 𝒌 𝑡 mem tentative attention 𝒇 𝜶 𝑡 tent 𝑾DPL 𝒄 𝑡 𝜶 𝑡 dynamic combination 𝒄 𝑡 𝑴 𝑡 8
  • 9. MNIST Dialog • Synthetic visual dialog dataset that highlights model’s ability resolving visual references. • Designed to contain ambiguous expressions and strong inter- dependency among questions in a dialog. 9
  • 10. Results on MNIST Dialog • Attention based models perform better. • The proposed model outperforms baselines. • Use of memory component stabilizes the performance at later steps of dialog. 10
  • 11. Qualitative Result on MNIST Dialog • Semantically reasonable use of retrieved attention from attention memory. 11
  • 12. Parameter Analysis • Memory addressing coefficients with/without sequential preference. • t-SNE plots of dynamically predicted weights. 12
  • 13. Results on Visual Dialog • Proposed model outperforms previous works with a fewer number of parameters. 13
  • 14. Conclusion • We proposed a novel method for visual dialog resolving visual references through attention memory. • The proposed method can retrieve the attention of ambiguous expressions from the memory to resolve the reference of the current target. • Memory retrieval process allows the model to obtain semantically reasonable attention from attention history. • Our model shows the state-of-the-art performance on both synthetic and real datasets. 14
  • 15. Visual Reference Resolution using Attention Memory for Visual Dialog Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal {hsseo, bhhan}@postech.ac.kr {andreas.lehrmann, lsigal}@disneyresearch.com NIPS 2017 Visit http://cvlab.postech.ac.kr/research/attmem/ for more information