SlideShare a Scribd company logo
1 of 30
Download to read offline
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (1/29) Presented by CGLAB 이명규
2019/08/23
Few-Shot Adversarial Learning of
Realistic Neural Talking Head Models
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (2/29)
I N D E X
01
02
03
Introduction
Proposed Method
Conclusion
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (3/29)
Introduction
Part 01
1. 논문소개
2. 관련 연구 요약
3. Meta Learning
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (4/29)
↳
논문소개1-1
• 발표 : arXiv (Submitted on 20 May 2019)
• 저자 : Egor Zakharov et al. (Samsung AI Center, Moscow)
• 타겟 이미지의 랜드마크 정보로 소스 이미지의 talking head를 실시간으로
생성함으로서 소스 이미지의 사람이 말하는 것처럼 보이게 하는 연구
저널정보 및 논문소개
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (5/29)
↳
관련 연구 요약1-2
• 사람 머리 모델링은 외곽뿐만 아니라 머리카락, 입 등을 모두 모델링해야 하므로
매우 어려운 작업
• 또한 언캐니 밸리 효과를 피하기 위해 높은 리얼리티가 요구됨
Modeling Human Face
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (6/29)
↳
관련 연구 요약1-2
Previous Works
Face2Face (Justus Thies et al., CVPR2016)
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (7/29)
↳
관련 연구 요약1-2
Previous Works
Video-to-Video Synthesis
(Wang et al., CVPR2017)
ObamaNet
(Supasorn Suwajanakorn et al.,
SIGGRAPH2017)
“17H of Obama voice data!”
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (8/29)
↳
Meta Learning1-3
Overview of Meta-Learning
Wikipedia, https://www.youtube.com/watch?v=EkAqYbpCYAc
Meta is a prefix used in English to indicate a
concept that is an abstraction behind another
concept, used to complete or add to the latter.
메타는 영어의 접두사로, 다른 개념으로부터의
추상화를 가리키며 후자를 완성하거나 추가하는 데에
쓰인다. 인식론에서 접두사 meta는 "~에 대해서"라는
뜻으로 쓰인다. 이를테면 메타데이터는 데이터에 대한
데이터이다.
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (9/29)
↳
Overview of Meta-Learning
https://www.dreamworks.com/ , https://if-blog.tistory.com/8342
*Meta Learning :
“Learning
how to learn”
?
Meta Learning1-3
+
Meta Learner
(Task1)
Meta Learner
(Task2)
Base Learner
(Future Task)
← …
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (10/29)
↳
Overview of Meta-Learning
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (https://arxiv.org/abs/1703.03400)
Meta-Learning with adaptation is better than Pretrain+finetuning the network
Meta Learning1-3
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (11/29)
Proposed Method
Part 02
1. Architecture Overview
2. Meta-Learning Stage
3. Few-Shot Learning
4. Implementation Details
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (12/29)
↳
Architecture Overview2-1
Model & Notations
https://www.slideshare.net/TaesuKim3/pr12165-fewshot-adversarial-learning-of-realistic-neural-talking-head-models
Input as
Conditional Vector
𝒕𝒕 : Frame Index,
𝒊𝒊 : Video Index
Projection
Matrix
Parameter of G
Parameter of
Embedder
Parameter of D
• 𝒙𝒙𝒊𝒊 : 𝒊𝒊번째 비디오
• 𝒙𝒙𝒊𝒊(𝒕𝒕) : 𝒊𝒊번째 비디오의 𝒕𝒕번째 프레임
• 𝒚𝒚𝒊𝒊(𝒕𝒕) : 𝒙𝒙𝒊𝒊(𝒕𝒕)로부터 추출한 얼굴 랜드마크 이미지
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (13/29)
↳
Architecture Overview2-1
Model - 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬
• 비디오 프레임 𝒙𝒙𝒊𝒊(𝒔𝒔)와 얼굴 랜드마크 이미지 𝒚𝒚𝒊𝒊(𝒔𝒔)를 N차원 벡터 �𝒆𝒆𝒊𝒊(𝒔𝒔)로 임베딩
• 𝝓𝝓 : 메타 러닝 단계에서의 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬 파라미터
• Frame들의 평균을 내므로 샷들을 대표하는 vector representation을 학습하게 됨
• 얼굴 포즈에 따른 프레임들의 비디오 정보(person’s identity)를 학습
𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬 𝑬𝑬(𝒙𝒙𝒊𝒊 𝒔𝒔 , 𝒚𝒚𝒊𝒊 𝒔𝒔 ; 𝝓𝝓)
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
Multiple FCNs
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (14/29)
↳
Architecture Overview2-1
Model – 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮
• 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬가 보지 못한 랜드마크 𝒚𝒚𝒊𝒊 𝒕𝒕 와 임베딩된 벡터 �𝒆𝒆𝒊𝒊를 받아 �𝒙𝒙𝒊𝒊(𝒕𝒕)프레임을 생성
• 생성된 이미지와 정답의 Similarity가 최대이도록 학습
• 메타러닝 단계에서 𝝍𝝍는 직접 학습되고, �𝝍𝝍𝒊𝒊는 프로젝션 행렬 𝑃𝑃와 �𝒆𝒆𝒊𝒊로부터 계산됨(�𝝍𝝍𝒊𝒊 = 𝑷𝑷�𝒆𝒆𝒊𝒊)
• 𝝍𝝍 : Person-Generic Parameter (global)
• �𝝍𝝍𝒊𝒊 : Person-Specific Parameter
𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 𝑮𝑮(𝒚𝒚𝒊𝒊 𝒕𝒕 , �𝒆𝒆𝒊𝒊; 𝝍𝝍, 𝑷𝑷)
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (15/29)
↳
Architecture Overview2-1
Model – 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫
• 정답 비디오 프레임 𝒙𝒙𝒊𝒊 𝒕𝒕 와 이에 대한 랜드마크 이미지 𝒚𝒚𝒊𝒊 𝒕𝒕 , 비디오 인덱스 𝑖𝑖를 입력으로 받음
• 입력 프레임과 랜드마크 이미지를 임베딩하는 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑽𝑽(𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 ; 𝜽𝜽)를 포함
• 최종적으로 D는 ①입력 프레임이 실제 이미지인지, ②랜드마크에 부합하는지를 스코어 𝑟𝑟로 출력
𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫 𝑫𝑫(𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃)
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (16/29)
↳
Meta-Learning Stage2-2
Training Overview: Meta Learning Stage
1. 메타러닝 단계의 각 에피소드에서 하나의 비디오 𝒊𝒊와 해당 비디오의 프레임 𝒕𝒕 하나 선택
2. 𝒕𝒕 외에 추가로 𝑲𝑲개의 프레임 추출 (𝒔𝒔𝟏𝟏, 𝒔𝒔𝟐𝟐, … , 𝒔𝒔𝑲𝑲)
3. 𝒊𝒊번째 비디오에 대한 대표 임베딩 �𝒆𝒆𝒊𝒊는 K개 임베딩 벡터들(�𝒆𝒆𝒊𝒊(𝒔𝒔𝑲𝑲))의 평균으로 구함
�𝒆𝒆𝒊𝒊 =
𝟏𝟏
𝑲𝑲
�
𝒌𝒌=𝟏𝟏
𝑲𝑲
𝑬𝑬(𝒙𝒙𝒊𝒊 𝒔𝒔𝒌𝒌 , 𝒚𝒚𝒊𝒊 𝒔𝒔𝒌𝒌 ; 𝝓𝝓)
4. 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮를 통해 이미지 생성 (�𝒙𝒙𝒊𝒊 𝒕𝒕 = 𝑮𝑮(𝒚𝒚𝒊𝒊 𝒕𝒕 , �𝒆𝒆𝒊𝒊; 𝝍𝝍, 𝑷𝑷))
5. 아래 Loss 함수를 최소화하도록 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬과 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮의 파라미터 최적화
𝓛𝓛 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 = 𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 + 𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 + 𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾)
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (17/29)
↳
Meta-Learning Stage2-2
Loss Functions of 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮
• 𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 : content loss term으로, 정답 이미지 𝒙𝒙𝒊𝒊(𝒕𝒕)와 𝑮𝑮(�𝒙𝒙𝒊𝒊(𝒕𝒕))사이의 거리 측정
• Perceptual similarity measures from VGG19 and VGGFace
• Conceptual ~ : 여자, 남자 등 추상적 측면의 정보
• Perceptual ~ : 근처 픽셀과의 관계 등 좀더 현실적 정보
𝓛𝓛 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 =
𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 +
𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 +
𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾)
VGG19 Network VGGFace Network
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (18/29)
↳
Meta-Learning Stage2-2
Loss Functions of 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮
• 𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 = −𝑫𝑫 �𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃 + 𝓛𝓛𝑭𝑭𝑭𝑭
• 첫번째 항 : 실제와 얼마나 같은지 나타내는 스코어 (높을수록 실제와 유사)
• D는 입력을 먼저 𝑵𝑵차원 벡터 𝑽𝑽(𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 ; 𝜽𝜽)로 임베딩 후 Realism Score 계산
𝐃𝐃 �𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃 = 𝑽𝑽 𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 ; 𝜽𝜽 𝑻𝑻 𝑾𝑾𝒊𝒊 + 𝒘𝒘𝟎𝟎 + 𝒃𝒃
• 𝒘𝒘𝟎𝟎와 𝒃𝒃는 �𝒙𝒙𝒊𝒊 𝒕𝒕 의 realism과 랜드마크 이미지 𝒚𝒚𝒊𝒊 𝒕𝒕 사이의 일치 정도를 파악하는데 사용
• 두번째 항 : 각 D output에 대해 perceptual similarity를 측정 (‘As can be seen, adding feature matching
loss substantially improves the performance, while adding perceptual loss further enhances the results.’[38])
𝓛𝓛𝑭𝑭𝑭𝑭(𝑮𝑮, 𝑫𝑫𝒌𝒌) = 𝔼𝔼(𝒔𝒔, 𝐱𝐱) �
𝒊𝒊=𝟏𝟏
𝑻𝑻
𝟏𝟏
𝑵𝑵𝒊𝒊
𝑫𝑫𝒌𝒌
𝒊𝒊
𝒔𝒔, 𝐱𝐱 − 𝑫𝑫𝒌𝒌
𝒊𝒊
(𝒔𝒔, 𝑮𝑮(𝒔𝒔))
𝟏𝟏
𝓛𝓛 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 =
𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 +
𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 +
𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾)
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (19/29)
↳
Meta-Learning Stage2-2
Loss Functions of 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮
• 𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾) : embedding matching term으로,
�𝒆𝒆𝒊𝒊와 𝑾𝑾𝒊𝒊사이의 𝑳𝑳𝟏𝟏거리에 패널티를 부여해 두 임베딩이
비슷해지도록 함
𝓛𝓛 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 =
𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 +
𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 +
𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾)
𝑾𝑾𝒊𝒊는 각 비디오들의
대표 임베딩들을
갖고 있는 row
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (20/29)
↳
Meta-Learning Stage2-2
Loss Functions of 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫
𝓛𝓛𝑫𝑫𝑫𝑫𝑫𝑫 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃 =
𝒎𝒎𝒎𝒎𝒎𝒎 𝟎𝟎, 𝟏𝟏 + 𝑫𝑫 �𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝝓𝝓, 𝝍𝝍, 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃 +
𝒎𝒎𝒎𝒎𝒎𝒎(𝟎𝟎, 𝟏𝟏 − 𝑫𝑫(𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃))
• 실제 이미지가 높은 𝒓𝒓스코어를 갖고 가짜는 낮은 스코어를 갖도록 hinge loss 최소화
• 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬과 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 파라미터 업데이트 후 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫가 업데이트됨
가짜 이미지의 경우 Realism score가 -1보다 작아야
이 term의 전체가 0이 됨.
진짜 이미지의 경우 Realism score가 +1보다 커져야
이 term의 전체가 1이 됨.
https://www.slideshare.net/TaesuKim3/pr12165-fewshot-adversarial-learning-of-realistic-neural-talking-head-models
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (21/29)
↳
Few-Shot Learning2-3
Few-shot learning by Fine-tuning : 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮
𝝍𝝍 : Person Generic Parameters(Global한 값)
𝝍𝝍′ : Person Specific Parameters(특정 사람에 대한 값)
1. 새로운 Talking Head에 대한 임베딩 생성 : 메타러닝 단계에서 학습이 끝난 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬 사용
�𝒆𝒆𝑵𝑵𝑵𝑵𝑵𝑵 =
𝟏𝟏
𝑻𝑻
�
𝒕𝒕=𝟏𝟏
𝑻𝑻
𝑬𝑬(𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝝓𝝓)
단, 이 단계에서는 기존 G의 파라미터 𝝍𝝍와 𝑷𝑷를 그대로 사용하므로 Identity Gap 발생
2. 그래서 Fine-Tuning : 기존 𝑮𝑮(𝒚𝒚 𝒕𝒕 , �𝒆𝒆𝑵𝑵𝑵𝑵𝑵𝑵; 𝝍𝝍, 𝑷𝑷)대신 𝑮𝑮′(𝒚𝒚 𝒕𝒕 ; 𝝍𝝍, 𝝍𝝍′) 사용
𝝍𝝍′는 메타러닝 때와 달리 직접 학습되는 파라미터 (Initialize 𝝍𝝍′ = 𝑷𝑷�𝒆𝒆𝑵𝑵𝑵𝑵𝑵𝑵)
따라서 메타러닝의 Prior로 새로운 사람에 대한 학습을 수행
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
메타러닝에서 학습된
𝑬𝑬 네트워크로 추출
임베딩 벡터를
Projection하는 데에만 사용
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (22/29)
↳
Few-Shot Learning2-3
Few-shot learning by Fine-tuning : 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
3. 𝑫𝑫(𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃)를 통해 Realism score 계산 : 𝒘𝒘′는 𝒘𝒘𝟎𝟎과 �𝒆𝒆𝑵𝑵𝑵𝑵𝑵𝑵의 합으로 초기화 후 학습
 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑽𝑽(𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽)와 𝒃𝒃는 메타 러닝의 결과
 Realism Score는 메타러닝 단계와 비슷하게 입력을 N차원 벡터로 임베딩한 후 계산
𝑫𝑫′ �𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃 = 𝑽𝑽(�𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽)𝑻𝑻 𝒘𝒘′ + 𝒃𝒃
새로운 화자
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (23/29)
↳
Few-Shot Learning2-3
Loss functions
• Generator의 파라미터 𝝍𝝍와 𝝍𝝍′는 아래 Loss를 최소화하도록 학습
𝓛𝓛′ 𝝍𝝍, 𝝍𝝍′, 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃 = 𝓛𝓛′
𝑪𝑪𝑪𝑪𝑪𝑪 𝝍𝝍, 𝝍𝝍′ + 𝓛𝓛′
𝑨𝑨𝑨𝑨𝑨𝑨 𝝍𝝍, 𝝍𝝍′, 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃
• Discriminator의 파라미터 학습에는 아래 Loss를 최소화하도록 학습
𝓛𝓛′
𝑫𝑫𝑫𝑫𝑫𝑫 𝝍𝝍, 𝝍𝝍′
, 𝜽𝜽, 𝒘𝒘′
, 𝒃𝒃 = 𝒎𝒎𝒎𝒎𝒎𝒎 𝟎𝟎, 𝟏𝟏 + 𝑫𝑫 �𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝝍𝝍, 𝝍𝝍′
, 𝜽𝜽, 𝒘𝒘′
, 𝒃𝒃 +
𝒎𝒎𝒎𝒎𝒎𝒎(𝟎𝟎, 𝟏𝟏 − 𝑫𝑫(𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃))
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (24/29)
• Downsampling과 Upsampling 레이어들에 Batch Normalization 대신
Instance Normalization 적용
• Embedder, D는 모두 Residual Block 적용(D는 마지막 부분에 하나 더 추가)
• 두 네트워크 모두 벡터화된 결과를 얻기 위해 Global Sum Pooling 수행)
• 모든 네트워크의 각 레이어에 Spectral Normalizaiton과
Self-Attention block 구조 적용
• 학습 최적화에 Adam Optimizer 사용
• Learning Rate
• Embedder, Generator : 5 ∗ 𝟏𝟏𝟏𝟏−𝟓𝟓
• Discriminator : 𝟐𝟐 ∗ 𝟏𝟏𝟏𝟏−𝟒𝟒
Implementation Details2-4
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (25/29)
Conclusion
Part 03
1. Experiments
2. Puppeteering Results
3. Time Comparison Results
4. Conclusion
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (26/29)
↳
3-1 Experiments
• FID(Frechet-Inception Distance) : 인지적 사실성(Perceptual realism)을
측정하는 metric
• Structured Similarity(SSIM) : GT 이미지와의 Low-level image
유사도를 측정하는 metric
• Cosine Similarity(CSIM) : Identity mismatch 정도를 측정하기 위해
임베딩 벡터 사이의 코싸인 유사도를 재는 metric
• USER : 세 장의 같은 사람 사진을 보여준 후 가짜를 고르도록 사람에게
시키고 맞힌 비율을 계산 – 사진의 사실성과 Identity mismatch를
동시 측정 가능
https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (27/29)
↳
3-2 Puppeteering Results
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (28/29)
↳
3-2 Time Comparison Results
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (29/29)
↳
Conclusion3-2
• Conclusion
• 실제와 매우 유사한 Talking head를 메타 러닝과 GAN을 활용해 만드는
프레임워크 제안
• Few-shot learning을 통해 적은 수의 이미지로도 좋은 결과를 얻음
• Limitations
• 실제 사람의 표정에서 드러나는 감정이나 시선과 같은 부분은 잘 모사하지 못함
• 랜드마크의 차이로 인해 원하지 않는 사람으로 보일 수 있음.(Landmark adaption의 필요)
Conclusion & Limitations
CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (30/29)
Thank you for Listening.
Email : brstar96@naver.com (or brstar96@soongsil.ac.kr)
Mobile : +82-10-8234-3179

More Related Content

What's hot

I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)Susang Kim
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...Sunghoon Joo
 
Survey on Monocular Depth Estimation
Survey on Monocular Depth EstimationSurvey on Monocular Depth Estimation
Survey on Monocular Depth Estimation범준 김
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰taeseon ryu
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesSunghoon Joo
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Networkagdatalab
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchSunghoon Joo
 
[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement LearningKiho Suh
 
Final project v0.84
Final project v0.84Final project v0.84
Final project v0.84Soukwon Jun
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksSanghoon Yoon
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkrlawjdgns
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...LEE HOSEONG
 
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)jungminchung
 
03.12 cnn backpropagation
03.12 cnn backpropagation03.12 cnn backpropagation
03.12 cnn backpropagationDea-hwan Ki
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to ZLEE HOSEONG
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural networkDongyi Kim
 
Text2Action: Generative Adversarial Synthesis from Language to Action
Text2Action: Generative Adversarial Synthesis from Language to ActionText2Action: Generative Adversarial Synthesis from Language to Action
Text2Action: Generative Adversarial Synthesis from Language to ActionNAVER Engineering
 
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...taeseon ryu
 

What's hot (20)

I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
 
Survey on Monocular Depth Estimation
Survey on Monocular Depth EstimationSurvey on Monocular Depth Estimation
Survey on Monocular Depth Estimation
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Network
 
Detecting fake jpeg images
Detecting fake jpeg imagesDetecting fake jpeg images
Detecting fake jpeg images
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning
 
Final project v0.84
Final project v0.84Final project v0.84
Final project v0.84
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
 
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
 
03.12 cnn backpropagation
03.12 cnn backpropagation03.12 cnn backpropagation
03.12 cnn backpropagation
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to Z
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural network
 
Text2Action: Generative Adversarial Synthesis from Language to Action
Text2Action: Generative Adversarial Synthesis from Language to ActionText2Action: Generative Adversarial Synthesis from Language to Action
Text2Action: Generative Adversarial Synthesis from Language to Action
 
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
 
CNN
CNNCNN
CNN
 

Similar to (Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

리소스 중심의 서든어택2 실시간 메모리 프로파일링 시스템 개발기
리소스 중심의 서든어택2 실시간 메모리 프로파일링 시스템 개발기리소스 중심의 서든어택2 실시간 메모리 프로파일링 시스템 개발기
리소스 중심의 서든어택2 실시간 메모리 프로파일링 시스템 개발기Wonha Ryu
 
NDC2011 - 카메라 시스템을 통해 살펴보는 인터랙티브 시스템 개발의 문제점
NDC2011 - 카메라 시스템을 통해 살펴보는 인터랙티브 시스템 개발의 문제점NDC2011 - 카메라 시스템을 통해 살펴보는 인터랙티브 시스템 개발의 문제점
NDC2011 - 카메라 시스템을 통해 살펴보는 인터랙티브 시스템 개발의 문제점Jubok Kim
 
투명화 처리로 살펴 본 한층 고급화된 모바일 UX
투명화 처리로 살펴 본 한층 고급화된 모바일 UX투명화 처리로 살펴 본 한층 고급화된 모바일 UX
투명화 처리로 살펴 본 한층 고급화된 모바일 UXDae Yeon Jin
 
악플분류 모델링 프로젝트
악플분류 모델링 프로젝트악플분류 모델링 프로젝트
악플분류 모델링 프로젝트DataScienceLab
 
GAN with Mathematics
GAN with MathematicsGAN with Mathematics
GAN with MathematicsHyeongmin Lee
 
Open jig-ware 6회-오로카세미나pptx
Open jig-ware 6회-오로카세미나pptxOpen jig-ware 6회-오로카세미나pptx
Open jig-ware 6회-오로카세미나pptxJinwook On
 
[컨퍼런스] 모두콘 2018 리뷰
[컨퍼런스] 모두콘 2018 리뷰[컨퍼런스] 모두콘 2018 리뷰
[컨퍼런스] 모두콘 2018 리뷰Donghyeon Kim
 
Chapter 7 Regularization for deep learning - 2
Chapter 7 Regularization for deep learning - 2Chapter 7 Regularization for deep learning - 2
Chapter 7 Regularization for deep learning - 2KyeongUkJang
 
개발자를 위한 공감세미나 tensor-flow
개발자를 위한 공감세미나 tensor-flow개발자를 위한 공감세미나 tensor-flow
개발자를 위한 공감세미나 tensor-flow양 한빛
 
[NDC08] 최적화와 프로파일링 - 송창규
[NDC08] 최적화와 프로파일링 - 송창규[NDC08] 최적화와 프로파일링 - 송창규
[NDC08] 최적화와 프로파일링 - 송창규ChangKyu Song
 
OpenJigWare(V02.00.04)
OpenJigWare(V02.00.04)OpenJigWare(V02.00.04)
OpenJigWare(V02.00.04)Jinwook On
 
Ml for 정형데이터
Ml for 정형데이터Ml for 정형데이터
Ml for 정형데이터JEEHYUN PAIK
 
Coding interview
Coding interviewCoding interview
Coding interviewSoohan Ahn
 
전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011devCAT Studio, NEXON
 

Similar to (Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (17)

파이썬을 활용한 자연어 분석 - 2차
파이썬을 활용한 자연어 분석 - 2차파이썬을 활용한 자연어 분석 - 2차
파이썬을 활용한 자연어 분석 - 2차
 
메이크챗봇 자연어기초
메이크챗봇 자연어기초메이크챗봇 자연어기초
메이크챗봇 자연어기초
 
Deep learning overview
Deep learning overviewDeep learning overview
Deep learning overview
 
리소스 중심의 서든어택2 실시간 메모리 프로파일링 시스템 개발기
리소스 중심의 서든어택2 실시간 메모리 프로파일링 시스템 개발기리소스 중심의 서든어택2 실시간 메모리 프로파일링 시스템 개발기
리소스 중심의 서든어택2 실시간 메모리 프로파일링 시스템 개발기
 
NDC2011 - 카메라 시스템을 통해 살펴보는 인터랙티브 시스템 개발의 문제점
NDC2011 - 카메라 시스템을 통해 살펴보는 인터랙티브 시스템 개발의 문제점NDC2011 - 카메라 시스템을 통해 살펴보는 인터랙티브 시스템 개발의 문제점
NDC2011 - 카메라 시스템을 통해 살펴보는 인터랙티브 시스템 개발의 문제점
 
투명화 처리로 살펴 본 한층 고급화된 모바일 UX
투명화 처리로 살펴 본 한층 고급화된 모바일 UX투명화 처리로 살펴 본 한층 고급화된 모바일 UX
투명화 처리로 살펴 본 한층 고급화된 모바일 UX
 
악플분류 모델링 프로젝트
악플분류 모델링 프로젝트악플분류 모델링 프로젝트
악플분류 모델링 프로젝트
 
GAN with Mathematics
GAN with MathematicsGAN with Mathematics
GAN with Mathematics
 
Open jig-ware 6회-오로카세미나pptx
Open jig-ware 6회-오로카세미나pptxOpen jig-ware 6회-오로카세미나pptx
Open jig-ware 6회-오로카세미나pptx
 
[컨퍼런스] 모두콘 2018 리뷰
[컨퍼런스] 모두콘 2018 리뷰[컨퍼런스] 모두콘 2018 리뷰
[컨퍼런스] 모두콘 2018 리뷰
 
Chapter 7 Regularization for deep learning - 2
Chapter 7 Regularization for deep learning - 2Chapter 7 Regularization for deep learning - 2
Chapter 7 Regularization for deep learning - 2
 
개발자를 위한 공감세미나 tensor-flow
개발자를 위한 공감세미나 tensor-flow개발자를 위한 공감세미나 tensor-flow
개발자를 위한 공감세미나 tensor-flow
 
[NDC08] 최적화와 프로파일링 - 송창규
[NDC08] 최적화와 프로파일링 - 송창규[NDC08] 최적화와 프로파일링 - 송창규
[NDC08] 최적화와 프로파일링 - 송창규
 
OpenJigWare(V02.00.04)
OpenJigWare(V02.00.04)OpenJigWare(V02.00.04)
OpenJigWare(V02.00.04)
 
Ml for 정형데이터
Ml for 정형데이터Ml for 정형데이터
Ml for 정형데이터
 
Coding interview
Coding interviewCoding interview
Coding interview
 
전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011
 

More from MYEONGGYU LEE

(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...MYEONGGYU LEE
 
Survey of HDR & Tone Mapping Task
Survey of HDR & Tone Mapping TaskSurvey of HDR & Tone Mapping Task
Survey of HDR & Tone Mapping TaskMYEONGGYU LEE
 
Survey of Super Resolution Task (SISR Only)
Survey of Super Resolution Task (SISR Only)Survey of Super Resolution Task (SISR Only)
Survey of Super Resolution Task (SISR Only)MYEONGGYU LEE
 
(Paper Review)U-GAT-IT: unsupervised generative attentional networks with ada...
(Paper Review)U-GAT-IT: unsupervised generative attentional networks with ada...(Paper Review)U-GAT-IT: unsupervised generative attentional networks with ada...
(Paper Review)U-GAT-IT: unsupervised generative attentional networks with ada...MYEONGGYU LEE
 
(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)MYEONGGYU LEE
 
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...MYEONGGYU LEE
 
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...MYEONGGYU LEE
 
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...MYEONGGYU LEE
 
(Paper Review)Image to image translation with conditional adversarial network...
(Paper Review)Image to image translation with conditional adversarial network...(Paper Review)Image to image translation with conditional adversarial network...
(Paper Review)Image to image translation with conditional adversarial network...MYEONGGYU LEE
 
(Book summary) Ensemble method 2018summerml_study
(Book summary) Ensemble method 2018summerml_study(Book summary) Ensemble method 2018summerml_study
(Book summary) Ensemble method 2018summerml_studyMYEONGGYU LEE
 
(Paper Review)Towards foveated rendering for gaze tracked virtual reality
(Paper Review)Towards foveated rendering for gaze tracked virtual reality(Paper Review)Towards foveated rendering for gaze tracked virtual reality
(Paper Review)Towards foveated rendering for gaze tracked virtual realityMYEONGGYU LEE
 
(Paper Review)Geometrically correct projection-based texture mapping onto a d...
(Paper Review)Geometrically correct projection-based texture mapping onto a d...(Paper Review)Geometrically correct projection-based texture mapping onto a d...
(Paper Review)Geometrically correct projection-based texture mapping onto a d...MYEONGGYU LEE
 
(Papers Review)CNN for sentence classification
(Papers Review)CNN for sentence classification(Papers Review)CNN for sentence classification
(Papers Review)CNN for sentence classificationMYEONGGYU LEE
 

More from MYEONGGYU LEE (14)

(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
 
Survey of HDR & Tone Mapping Task
Survey of HDR & Tone Mapping TaskSurvey of HDR & Tone Mapping Task
Survey of HDR & Tone Mapping Task
 
Survey of Super Resolution Task (SISR Only)
Survey of Super Resolution Task (SISR Only)Survey of Super Resolution Task (SISR Only)
Survey of Super Resolution Task (SISR Only)
 
ICCV 2019 Review
ICCV 2019 ReviewICCV 2019 Review
ICCV 2019 Review
 
(Paper Review)U-GAT-IT: unsupervised generative attentional networks with ada...
(Paper Review)U-GAT-IT: unsupervised generative attentional networks with ada...(Paper Review)U-GAT-IT: unsupervised generative attentional networks with ada...
(Paper Review)U-GAT-IT: unsupervised generative attentional networks with ada...
 
(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)
 
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
 
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...
(Paper Review)A versatile learning based 3D temporal tracker - scalable, robu...
 
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
 
(Paper Review)Image to image translation with conditional adversarial network...
(Paper Review)Image to image translation with conditional adversarial network...(Paper Review)Image to image translation with conditional adversarial network...
(Paper Review)Image to image translation with conditional adversarial network...
 
(Book summary) Ensemble method 2018summerml_study
(Book summary) Ensemble method 2018summerml_study(Book summary) Ensemble method 2018summerml_study
(Book summary) Ensemble method 2018summerml_study
 
(Paper Review)Towards foveated rendering for gaze tracked virtual reality
(Paper Review)Towards foveated rendering for gaze tracked virtual reality(Paper Review)Towards foveated rendering for gaze tracked virtual reality
(Paper Review)Towards foveated rendering for gaze tracked virtual reality
 
(Paper Review)Geometrically correct projection-based texture mapping onto a d...
(Paper Review)Geometrically correct projection-based texture mapping onto a d...(Paper Review)Geometrically correct projection-based texture mapping onto a d...
(Paper Review)Geometrically correct projection-based texture mapping onto a d...
 
(Papers Review)CNN for sentence classification
(Papers Review)CNN for sentence classification(Papers Review)CNN for sentence classification
(Papers Review)CNN for sentence classification
 

(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

  • 1. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (1/29) Presented by CGLAB 이명규 2019/08/23 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
  • 2. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (2/29) I N D E X 01 02 03 Introduction Proposed Method Conclusion
  • 3. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (3/29) Introduction Part 01 1. 논문소개 2. 관련 연구 요약 3. Meta Learning
  • 4. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (4/29) ↳ 논문소개1-1 • 발표 : arXiv (Submitted on 20 May 2019) • 저자 : Egor Zakharov et al. (Samsung AI Center, Moscow) • 타겟 이미지의 랜드마크 정보로 소스 이미지의 talking head를 실시간으로 생성함으로서 소스 이미지의 사람이 말하는 것처럼 보이게 하는 연구 저널정보 및 논문소개
  • 5. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (5/29) ↳ 관련 연구 요약1-2 • 사람 머리 모델링은 외곽뿐만 아니라 머리카락, 입 등을 모두 모델링해야 하므로 매우 어려운 작업 • 또한 언캐니 밸리 효과를 피하기 위해 높은 리얼리티가 요구됨 Modeling Human Face
  • 6. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (6/29) ↳ 관련 연구 요약1-2 Previous Works Face2Face (Justus Thies et al., CVPR2016)
  • 7. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (7/29) ↳ 관련 연구 요약1-2 Previous Works Video-to-Video Synthesis (Wang et al., CVPR2017) ObamaNet (Supasorn Suwajanakorn et al., SIGGRAPH2017) “17H of Obama voice data!”
  • 8. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (8/29) ↳ Meta Learning1-3 Overview of Meta-Learning Wikipedia, https://www.youtube.com/watch?v=EkAqYbpCYAc Meta is a prefix used in English to indicate a concept that is an abstraction behind another concept, used to complete or add to the latter. 메타는 영어의 접두사로, 다른 개념으로부터의 추상화를 가리키며 후자를 완성하거나 추가하는 데에 쓰인다. 인식론에서 접두사 meta는 "~에 대해서"라는 뜻으로 쓰인다. 이를테면 메타데이터는 데이터에 대한 데이터이다.
  • 9. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (9/29) ↳ Overview of Meta-Learning https://www.dreamworks.com/ , https://if-blog.tistory.com/8342 *Meta Learning : “Learning how to learn” ? Meta Learning1-3 + Meta Learner (Task1) Meta Learner (Task2) Base Learner (Future Task) ← …
  • 10. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (10/29) ↳ Overview of Meta-Learning Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (https://arxiv.org/abs/1703.03400) Meta-Learning with adaptation is better than Pretrain+finetuning the network Meta Learning1-3
  • 11. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (11/29) Proposed Method Part 02 1. Architecture Overview 2. Meta-Learning Stage 3. Few-Shot Learning 4. Implementation Details
  • 12. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (12/29) ↳ Architecture Overview2-1 Model & Notations https://www.slideshare.net/TaesuKim3/pr12165-fewshot-adversarial-learning-of-realistic-neural-talking-head-models Input as Conditional Vector 𝒕𝒕 : Frame Index, 𝒊𝒊 : Video Index Projection Matrix Parameter of G Parameter of Embedder Parameter of D • 𝒙𝒙𝒊𝒊 : 𝒊𝒊번째 비디오 • 𝒙𝒙𝒊𝒊(𝒕𝒕) : 𝒊𝒊번째 비디오의 𝒕𝒕번째 프레임 • 𝒚𝒚𝒊𝒊(𝒕𝒕) : 𝒙𝒙𝒊𝒊(𝒕𝒕)로부터 추출한 얼굴 랜드마크 이미지
  • 13. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (13/29) ↳ Architecture Overview2-1 Model - 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬 • 비디오 프레임 𝒙𝒙𝒊𝒊(𝒔𝒔)와 얼굴 랜드마크 이미지 𝒚𝒚𝒊𝒊(𝒔𝒔)를 N차원 벡터 �𝒆𝒆𝒊𝒊(𝒔𝒔)로 임베딩 • 𝝓𝝓 : 메타 러닝 단계에서의 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬 파라미터 • Frame들의 평균을 내므로 샷들을 대표하는 vector representation을 학습하게 됨 • 얼굴 포즈에 따른 프레임들의 비디오 정보(person’s identity)를 학습 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬 𝑬𝑬(𝒙𝒙𝒊𝒊 𝒔𝒔 , 𝒚𝒚𝒊𝒊 𝒔𝒔 ; 𝝓𝝓) https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html Multiple FCNs
  • 14. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (14/29) ↳ Architecture Overview2-1 Model – 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 • 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬가 보지 못한 랜드마크 𝒚𝒚𝒊𝒊 𝒕𝒕 와 임베딩된 벡터 �𝒆𝒆𝒊𝒊를 받아 �𝒙𝒙𝒊𝒊(𝒕𝒕)프레임을 생성 • 생성된 이미지와 정답의 Similarity가 최대이도록 학습 • 메타러닝 단계에서 𝝍𝝍는 직접 학습되고, �𝝍𝝍𝒊𝒊는 프로젝션 행렬 𝑃𝑃와 �𝒆𝒆𝒊𝒊로부터 계산됨(�𝝍𝝍𝒊𝒊 = 𝑷𝑷�𝒆𝒆𝒊𝒊) • 𝝍𝝍 : Person-Generic Parameter (global) • �𝝍𝝍𝒊𝒊 : Person-Specific Parameter 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 𝑮𝑮(𝒚𝒚𝒊𝒊 𝒕𝒕 , �𝒆𝒆𝒊𝒊; 𝝍𝝍, 𝑷𝑷) https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
  • 15. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (15/29) ↳ Architecture Overview2-1 Model – 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫 • 정답 비디오 프레임 𝒙𝒙𝒊𝒊 𝒕𝒕 와 이에 대한 랜드마크 이미지 𝒚𝒚𝒊𝒊 𝒕𝒕 , 비디오 인덱스 𝑖𝑖를 입력으로 받음 • 입력 프레임과 랜드마크 이미지를 임베딩하는 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑽𝑽(𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 ; 𝜽𝜽)를 포함 • 최종적으로 D는 ①입력 프레임이 실제 이미지인지, ②랜드마크에 부합하는지를 스코어 𝑟𝑟로 출력 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫 𝑫𝑫(𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃) https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
  • 16. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (16/29) ↳ Meta-Learning Stage2-2 Training Overview: Meta Learning Stage 1. 메타러닝 단계의 각 에피소드에서 하나의 비디오 𝒊𝒊와 해당 비디오의 프레임 𝒕𝒕 하나 선택 2. 𝒕𝒕 외에 추가로 𝑲𝑲개의 프레임 추출 (𝒔𝒔𝟏𝟏, 𝒔𝒔𝟐𝟐, … , 𝒔𝒔𝑲𝑲) 3. 𝒊𝒊번째 비디오에 대한 대표 임베딩 �𝒆𝒆𝒊𝒊는 K개 임베딩 벡터들(�𝒆𝒆𝒊𝒊(𝒔𝒔𝑲𝑲))의 평균으로 구함 �𝒆𝒆𝒊𝒊 = 𝟏𝟏 𝑲𝑲 � 𝒌𝒌=𝟏𝟏 𝑲𝑲 𝑬𝑬(𝒙𝒙𝒊𝒊 𝒔𝒔𝒌𝒌 , 𝒚𝒚𝒊𝒊 𝒔𝒔𝒌𝒌 ; 𝝓𝝓) 4. 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮를 통해 이미지 생성 (�𝒙𝒙𝒊𝒊 𝒕𝒕 = 𝑮𝑮(𝒚𝒚𝒊𝒊 𝒕𝒕 , �𝒆𝒆𝒊𝒊; 𝝍𝝍, 𝑷𝑷)) 5. 아래 Loss 함수를 최소화하도록 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬과 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮의 파라미터 최적화 𝓛𝓛 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 = 𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 + 𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 + 𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾) https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
  • 17. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (17/29) ↳ Meta-Learning Stage2-2 Loss Functions of 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 • 𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 : content loss term으로, 정답 이미지 𝒙𝒙𝒊𝒊(𝒕𝒕)와 𝑮𝑮(�𝒙𝒙𝒊𝒊(𝒕𝒕))사이의 거리 측정 • Perceptual similarity measures from VGG19 and VGGFace • Conceptual ~ : 여자, 남자 등 추상적 측면의 정보 • Perceptual ~ : 근처 픽셀과의 관계 등 좀더 현실적 정보 𝓛𝓛 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 = 𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 + 𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 + 𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾) VGG19 Network VGGFace Network https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
  • 18. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (18/29) ↳ Meta-Learning Stage2-2 Loss Functions of 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 • 𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 = −𝑫𝑫 �𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃 + 𝓛𝓛𝑭𝑭𝑭𝑭 • 첫번째 항 : 실제와 얼마나 같은지 나타내는 스코어 (높을수록 실제와 유사) • D는 입력을 먼저 𝑵𝑵차원 벡터 𝑽𝑽(𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 ; 𝜽𝜽)로 임베딩 후 Realism Score 계산 𝐃𝐃 �𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃 = 𝑽𝑽 𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 ; 𝜽𝜽 𝑻𝑻 𝑾𝑾𝒊𝒊 + 𝒘𝒘𝟎𝟎 + 𝒃𝒃 • 𝒘𝒘𝟎𝟎와 𝒃𝒃는 �𝒙𝒙𝒊𝒊 𝒕𝒕 의 realism과 랜드마크 이미지 𝒚𝒚𝒊𝒊 𝒕𝒕 사이의 일치 정도를 파악하는데 사용 • 두번째 항 : 각 D output에 대해 perceptual similarity를 측정 (‘As can be seen, adding feature matching loss substantially improves the performance, while adding perceptual loss further enhances the results.’[38]) 𝓛𝓛𝑭𝑭𝑭𝑭(𝑮𝑮, 𝑫𝑫𝒌𝒌) = 𝔼𝔼(𝒔𝒔, 𝐱𝐱) � 𝒊𝒊=𝟏𝟏 𝑻𝑻 𝟏𝟏 𝑵𝑵𝒊𝒊 𝑫𝑫𝒌𝒌 𝒊𝒊 𝒔𝒔, 𝐱𝐱 − 𝑫𝑫𝒌𝒌 𝒊𝒊 (𝒔𝒔, 𝑮𝑮(𝒔𝒔)) 𝟏𝟏 𝓛𝓛 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 = 𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 + 𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 + 𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾) https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
  • 19. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (19/29) ↳ Meta-Learning Stage2-2 Loss Functions of 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 • 𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾) : embedding matching term으로, �𝒆𝒆𝒊𝒊와 𝑾𝑾𝒊𝒊사이의 𝑳𝑳𝟏𝟏거리에 패널티를 부여해 두 임베딩이 비슷해지도록 함 𝓛𝓛 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 = 𝓛𝓛𝑪𝑪𝑪𝑪𝑪𝑪 𝝓𝝓, 𝝍𝝍, 𝑷𝑷 + 𝓛𝓛𝑨𝑨𝑨𝑨𝑨𝑨 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝑾𝑾𝟎𝟎, 𝒃𝒃 + 𝓛𝓛 𝑴𝑴𝑴𝑴𝑴𝑴(𝝓𝝓, 𝑾𝑾) 𝑾𝑾𝒊𝒊는 각 비디오들의 대표 임베딩들을 갖고 있는 row https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
  • 20. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (20/29) ↳ Meta-Learning Stage2-2 Loss Functions of 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫 𝓛𝓛𝑫𝑫𝑫𝑫𝑫𝑫 𝝓𝝓, 𝝍𝝍, 𝑷𝑷, 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃 = 𝒎𝒎𝒎𝒎𝒎𝒎 𝟎𝟎, 𝟏𝟏 + 𝑫𝑫 �𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝝓𝝓, 𝝍𝝍, 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃 + 𝒎𝒎𝒎𝒎𝒎𝒎(𝟎𝟎, 𝟏𝟏 − 𝑫𝑫(𝒙𝒙𝒊𝒊 𝒕𝒕 , 𝒚𝒚𝒊𝒊 𝒕𝒕 , 𝒊𝒊; 𝜽𝜽, 𝑾𝑾, 𝒘𝒘𝟎𝟎, 𝒃𝒃)) • 실제 이미지가 높은 𝒓𝒓스코어를 갖고 가짜는 낮은 스코어를 갖도록 hinge loss 최소화 • 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬과 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 파라미터 업데이트 후 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫가 업데이트됨 가짜 이미지의 경우 Realism score가 -1보다 작아야 이 term의 전체가 0이 됨. 진짜 이미지의 경우 Realism score가 +1보다 커져야 이 term의 전체가 1이 됨. https://www.slideshare.net/TaesuKim3/pr12165-fewshot-adversarial-learning-of-realistic-neural-talking-head-models
  • 21. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (21/29) ↳ Few-Shot Learning2-3 Few-shot learning by Fine-tuning : 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 𝝍𝝍 : Person Generic Parameters(Global한 값) 𝝍𝝍′ : Person Specific Parameters(특정 사람에 대한 값) 1. 새로운 Talking Head에 대한 임베딩 생성 : 메타러닝 단계에서 학습이 끝난 𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬 사용 �𝒆𝒆𝑵𝑵𝑵𝑵𝑵𝑵 = 𝟏𝟏 𝑻𝑻 � 𝒕𝒕=𝟏𝟏 𝑻𝑻 𝑬𝑬(𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝝓𝝓) 단, 이 단계에서는 기존 G의 파라미터 𝝍𝝍와 𝑷𝑷를 그대로 사용하므로 Identity Gap 발생 2. 그래서 Fine-Tuning : 기존 𝑮𝑮(𝒚𝒚 𝒕𝒕 , �𝒆𝒆𝑵𝑵𝑵𝑵𝑵𝑵; 𝝍𝝍, 𝑷𝑷)대신 𝑮𝑮′(𝒚𝒚 𝒕𝒕 ; 𝝍𝝍, 𝝍𝝍′) 사용 𝝍𝝍′는 메타러닝 때와 달리 직접 학습되는 파라미터 (Initialize 𝝍𝝍′ = 𝑷𝑷�𝒆𝒆𝑵𝑵𝑵𝑵𝑵𝑵) 따라서 메타러닝의 Prior로 새로운 사람에 대한 학습을 수행 https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html 메타러닝에서 학습된 𝑬𝑬 네트워크로 추출 임베딩 벡터를 Projection하는 데에만 사용
  • 22. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (22/29) ↳ Few-Shot Learning2-3 Few-shot learning by Fine-tuning : 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫 https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html 3. 𝑫𝑫(𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃)를 통해 Realism score 계산 : 𝒘𝒘′는 𝒘𝒘𝟎𝟎과 �𝒆𝒆𝑵𝑵𝑵𝑵𝑵𝑵의 합으로 초기화 후 학습  𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑽𝑽(𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽)와 𝒃𝒃는 메타 러닝의 결과  Realism Score는 메타러닝 단계와 비슷하게 입력을 N차원 벡터로 임베딩한 후 계산 𝑫𝑫′ �𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃 = 𝑽𝑽(�𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽)𝑻𝑻 𝒘𝒘′ + 𝒃𝒃 새로운 화자
  • 23. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (23/29) ↳ Few-Shot Learning2-3 Loss functions • Generator의 파라미터 𝝍𝝍와 𝝍𝝍′는 아래 Loss를 최소화하도록 학습 𝓛𝓛′ 𝝍𝝍, 𝝍𝝍′, 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃 = 𝓛𝓛′ 𝑪𝑪𝑪𝑪𝑪𝑪 𝝍𝝍, 𝝍𝝍′ + 𝓛𝓛′ 𝑨𝑨𝑨𝑨𝑨𝑨 𝝍𝝍, 𝝍𝝍′, 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃 • Discriminator의 파라미터 학습에는 아래 Loss를 최소화하도록 학습 𝓛𝓛′ 𝑫𝑫𝑫𝑫𝑫𝑫 𝝍𝝍, 𝝍𝝍′ , 𝜽𝜽, 𝒘𝒘′ , 𝒃𝒃 = 𝒎𝒎𝒎𝒎𝒎𝒎 𝟎𝟎, 𝟏𝟏 + 𝑫𝑫 �𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝝍𝝍, 𝝍𝝍′ , 𝜽𝜽, 𝒘𝒘′ , 𝒃𝒃 + 𝒎𝒎𝒎𝒎𝒎𝒎(𝟎𝟎, 𝟏𝟏 − 𝑫𝑫(𝒙𝒙 𝒕𝒕 , 𝒚𝒚 𝒕𝒕 ; 𝜽𝜽, 𝒘𝒘′, 𝒃𝒃)) https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
  • 24. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (24/29) • Downsampling과 Upsampling 레이어들에 Batch Normalization 대신 Instance Normalization 적용 • Embedder, D는 모두 Residual Block 적용(D는 마지막 부분에 하나 더 추가) • 두 네트워크 모두 벡터화된 결과를 얻기 위해 Global Sum Pooling 수행) • 모든 네트워크의 각 레이어에 Spectral Normalizaiton과 Self-Attention block 구조 적용 • 학습 최적화에 Adam Optimizer 사용 • Learning Rate • Embedder, Generator : 5 ∗ 𝟏𝟏𝟏𝟏−𝟓𝟓 • Discriminator : 𝟐𝟐 ∗ 𝟏𝟏𝟏𝟏−𝟒𝟒 Implementation Details2-4 https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
  • 25. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (25/29) Conclusion Part 03 1. Experiments 2. Puppeteering Results 3. Time Comparison Results 4. Conclusion
  • 26. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (26/29) ↳ 3-1 Experiments • FID(Frechet-Inception Distance) : 인지적 사실성(Perceptual realism)을 측정하는 metric • Structured Similarity(SSIM) : GT 이미지와의 Low-level image 유사도를 측정하는 metric • Cosine Similarity(CSIM) : Identity mismatch 정도를 측정하기 위해 임베딩 벡터 사이의 코싸인 유사도를 재는 metric • USER : 세 장의 같은 사람 사진을 보여준 후 가짜를 고르도록 사람에게 시키고 맞힌 비율을 계산 – 사진의 사실성과 Identity mismatch를 동시 측정 가능 https://taecheon.blogspot.com/2019/07/001-few-shot-adversarial-learning-of.html
  • 27. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (27/29) ↳ 3-2 Puppeteering Results
  • 28. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (28/29) ↳ 3-2 Time Comparison Results
  • 29. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (29/29) ↳ Conclusion3-2 • Conclusion • 실제와 매우 유사한 Talking head를 메타 러닝과 GAN을 활용해 만드는 프레임워크 제안 • Few-shot learning을 통해 적은 수의 이미지로도 좋은 결과를 얻음 • Limitations • 실제 사람의 표정에서 드러나는 감정이나 시선과 같은 부분은 잘 모사하지 못함 • 랜드마크의 차이로 인해 원하지 않는 사람으로 보일 수 있음.(Landmark adaption의 필요) Conclusion & Limitations
  • 30. CGLAB 이명규Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (30/29) Thank you for Listening. Email : brstar96@naver.com (or brstar96@soongsil.ac.kr) Mobile : +82-10-8234-3179