SlideShare a Scribd company logo
Computer Vision 연구
2023.02.13
오현우
2020~2021
Computer Graphics, Computer Vision
2022~2023
Computer Vision
2
Image Segmentation
3D Reconstruction
Image Restoration
Lip Generation
3
• YOLACT: Real-time Instance Segmentation
Image Segmentation
Instance Segmentation
2020.04~2020.08
• MODNet: Trimap-Free Portrait Matting in Real Time
• Real-Time High-Resolution Background Matting(BGMv2)
Image Segmentation
Image Matting
2020.09~2020.12
4
• PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization
• Expressive Body Capture: 3D Hands, Face, and Body from a Single Image(SMPL eXpressive)
• SMPL: A Skinned Multi-Person Linear Model
3D Reconstruction
Human Digitalization
2020.06~2020.10
SMPL SMPL X
PIFu
5
• Unsupervised Real-world Image Super Resolution via Domain-distance Aware Training(DASR)
• Designing a Practical Degradation Model for Deep Blind Image Super-Resolution(BSR GAN)
• SwinIR: Image Restoration Using Swin Transformer
• Towards Robust Blind Face Restoration with Codebook Lookup Transformer(CodeFormer)
• Inference를 통해 장단점 분석
Image Restoration
Super Resolution
2021.05~2022.11
6
• FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
• MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement
• Face2Face: Real-time Face Capture and Reenactment of RGB Videos
Lip Generation
Text to Mesh
안녕하세요
Fast
Speech2 Visemes
Phonemes
Phonemes : 음소
Visemes : 음소의 시각적 정의
3D Mesh
2021.10~2022.01
7
Lip Generation
• ObamaNet: Photo-realistic lip-sync from text
• Image-to-Image Translation with Conditional Adversarial Nets(Pix2Pix)
Speech to Image
안녕하세요 Char2Wav LSTM
Pix2Pix
Input
Input Output
2022.02~2022.05
8
Lip Generation
• A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild(Wav2Lip)
Speech to Image
2022.06~Now
9
Lip Generation
• A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild(Wav2Lip)
• SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
• Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
Speech to Image
2022.06~Now
10
Lip Generation
• A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild(Wav2Lip)
• SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
• Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
Speech to Image
2022.06~Now
11
• Data Parallelization
• Model Parallelization
DNN의 가장 큰 병목 현상인 학습시간의 단축을 multi-GPU의 활용으로 해결하고자 함
Appendix
DNN Parallelization
2018.02~2019.12
12
• Hough Transform
• Labeling
• SIFT
• SURF
Appendix
Classical Method
2016.02~2017.10
13
• Multimodal learning
• Audio-Image
• Lip Generation
• Wav2Lip 후속 연구
• Generative model
• Stable Diffusion
• Audio Representation과 Image Representation을 추출하는 방식을 바꾸어서 생성모델의 품질을 높이는 방향
• 생성모델을 기존의 GAN 방식에서 Diffusion Model과 같은 다른 생성기법을 반영해 생성 품질을 높이는 방향
• 2D Image의 한계를 극복하기 위해 3D 정보를 Estimate해서 더 사실과 가까운 생성물을 생성하는 방향
• Representing Scenes as Neural Radiance Fields for View Synthesis(NeRF)
• Learning Transferable Visual Models From Natural Language Supervision(CLIP)
• Denoising Diffusion Probabilistic Models(Diffusion Model)
• High-Resolution Image Synthesis with Latent Diffusion Models
연구 계획
14
NeRF: Representing Scenes as
Neural Radiance Fields for View Synthesis
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik,
Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng
AI Labs 영상처리파트
오현우주임
2023.01.31
3D Reconstruction vs Volume Rendering
• Reality Capture
https://www.youtube.com/watch?v=9kIPixG8GHA
3D Reconstruction Volume Rendering
3차원 형태의 샘플링 데이터를
2차원 투시로 보여주는 기술
NeRF 는 이 쪽에 속함
1) March camera rays through the scene to generate a sampled set of 3D points
2) Use those points and their corresponding 2D viewing directions as input
to the neural network to produce an output set of colors and densities
3) Use classical volume rendering techniques to accumulate those colors and densities into a 2D image.
NeRF Model - 1
𝐹Θ: 𝑋, 𝑑 → (𝑐, 𝜎)
𝑑 = (𝜃, 𝜙)
𝑋 = (𝑥, 𝑦, 𝑧) 𝑐 = (𝑟, 𝑔, 𝑏)
𝜎 = (𝜎)
Ray가 지나가는 곳에 존재하는 vertices의 x,y,z 값
Density(밀도)를 뜻하며 density가 커지면
물체가 불투명해지고(뒤에 있는 것들이 잘 보이지 않음),
density가 작아지면 물체가 투명해짐
NeRF Model - 2
NeRF Model - 3
𝑋 = (𝑥, 𝑦, 𝑧)
POSITIONAL
ENCODING
𝑑 = (𝜃, 𝜙)
POSITIONAL
ENCODING
𝜎 = (𝜎)
𝑐 = (𝑟, 𝑔, 𝑏)
𝑋 = (𝑥, 𝑦, 𝑧) 𝑑 = (𝜃, 𝜙)
+ 𝑐 = (𝑟, 𝑔, 𝑏)
𝜎 = (𝜎)
𝑋 = (𝑥, 𝑦, 𝑧)
NeRF Model - 4
Volume Rendering
density RGB
far
near
density
Density값에 마이너스를 주고 exp를 한 것은
해당 vertices의 위치에서 앞의 density가 클수록
내 weight를 작게 가져가겠다는 의미
대충 𝑡𝑖 Random Sampling 한다는 뜻
Camera Ray를 뜻함
• Model의 Output으로 나온 한 Ray의 Color와 density 값들은 한 pixel로 합쳐지는 Volume Rendering 과정을 거
침
• 합쳐진 pixel rgb값은 실제 이미지의 pixel rgb값과 MSE Loss를 거쳐 Back propagation을 통해 학습이 진행
COARSE MODEL
Positional Encoding
High frequency 데이터를 얻기 위해서 진행
data augmentation의 일종이라고 보면 됨
𝑋 = (𝑥, 𝑦, 𝑧)
POSITIONAL
ENCODING
𝑑 = (𝜃, 𝜙)
POSITIONAL
ENCODING
𝜎 = (𝜎)
𝑐 = (𝑟, 𝑔, 𝑏)
3차원을 60차원으로
3차원을 24차원으로
Hierarchical Volume Sampling
Weight를 Normalize 하고 Sampling을 더 많이 해서
COARSE MODEL과는 다른 FINE MODEL을 구축
Loss Function
COARSE MODEL과 FINE MODEL을 통해 Volume Rendering한 RGB값을
원본 RGB값과 Error 계산
NeRF 단점
1. 느린 속도
NeRF는 Training 및 Rendering 속도가 느림
NeRF 모델 하나당 한 물체를 표현할 수 있음
학습을 (200k~300k 기준) 한 번 돌리는데 대략 1~2일 소요
-> DeRF(21CVPR) , NeRF++, plenoxel(22CVPR)
2. NeRF는 Static한 Scene에서만 성능이 좋음
움직이는 물체가 있는 Scene에 대해서 많은 Noise를 생성
-> D-NeRF(21CVPR) , Nerfies(21ICCV) , HyperNeRF
NeRF 단점
3. NeRF는 같은 환경에서 촬영한 이미지에 대해서만 성능이 나옴
Static 한 물체더라도 날씨, 시간등에 따라 명암, 색상 빛 조건 등이 다를 수 있고,
Real world에는 사실 스튜디오에서 찍는게 아닌 이상 이런 데이터들이 더 많음
-> NeRV(21CVPR) , NeRD(21CVPR) , NeRF in the wild (21CVPR)
4. NeRF는 general한 Model이 아님
NeRF는 한 Model로 하나의 물체만 만들어낼 수 있음
-> GIRAFFE(21CVPR) , pixel-NeRF(22CVPR) ...
NeRF 단점
5. 너무 다양한 시점의 training set이 필요
NeRF에 input으로 들어가는 synthetic Training dataset은 100개
한 물체를 학습 하기 위해 100장의 사진을 찍는 것은 inefficient
몇 장의 사진만으로 물체를 렌더링 하는 연구 필요
-> pixel-NeRF(22CVPR) , DietNeRF(21ICCV), Instant-NGP
6. NeRF의 Input인 Camera Parameter
NeRF에서는 카메라의 위치를 알기 위한 intrinsic parameter와 extrinsic parameter 값이 필요
일반인이 스마트폰 카메라 등으로 물체를 촬영하여 학습을 하기에는 너무 많은 정보
이를 해결하기 위해 pose를 estimate, 혹은 pose 자체를 학습하는 등의 연구가 진행
-> iNeRF(21IROS) , NeRF-- , GNeRF(21ICCV) , BARF(21ICCV) , SCNeRF(21ICCV)
Semi Supervised Learning
Entropy, Relative Entropy and Mutual Information*
*Elements of Information Theory Thomas M. Cover, Joy A. Thomas
Definition 1 : The entropy of a discrete
random variable with p.d.f 𝑝(𝑥) is defined as,
𝐻 𝑥 : = −
𝑥
𝑝(𝑥) 𝑙𝑜𝑔 𝑝(𝑥)
Convention : 0 log 0 = 0
Remark 1 : The entropy 𝐻 𝑥 is a measure of
the average uncertainty of random variable 𝑋.
We can write also
𝐻 𝑥 = −
𝑥
𝑝(𝑥) 𝑙𝑜𝑔 𝑝(𝑥)
= −
𝑥
𝑝(𝑥) 𝑙𝑜𝑔 𝑝(
1
𝑝(𝑥)
)
Lemma 1 : H(x) ≥ 0
Proof : By definition,
𝐻(𝑥) = −
𝑥
𝑝(𝑥) 𝑙𝑜𝑔 𝑝(𝑥)
= −
𝑥
𝑝(𝑥)(− 𝑙𝑜𝑔 𝑝(𝑥)) ≥ 0
Definition 2 : The joint entropy 𝐻(𝑋, 𝑌) of a pair of
discrete random variable (𝑋, 𝑌) with joint p.d.f 𝑝(𝑥, 𝑦) is
defined as,
𝐻 𝑥, y = −
x y
𝑝(𝑥, y) 𝑙𝑜𝑔 𝑝(𝑥, y)
Clearly 𝐻(𝑋, 𝑌) ≥ 0
Definition 3 : If (𝑋, 𝑌) ~ 𝑝(𝑥, 𝑦),
then the conditional entropy 𝐻(𝑌|𝑋) is defined as,
𝐻 𝑌 𝑋 ≔
𝑥
𝑝 𝑥 𝐻 𝑌 𝑋 = 𝑥
=
𝑥
𝑝(𝑥)(−1)
𝑦
𝑝(𝑦, 𝑥) 𝑙𝑜𝑔 𝑝(𝑦|𝑥)
= −
𝑥 𝑦
𝑝 𝑦 𝑥 𝑝(𝑥) 𝑙𝑜𝑔 𝑝(𝑦|𝑥)
= −
𝑥 𝑦
𝑝(𝑥, 𝑦) 𝑙𝑜𝑔 𝑝(𝑦|𝑥)
= −𝛦𝑝 𝑥,𝑦 (𝑙𝑜𝑔 𝑝 𝑌 𝑋 )
Theorem(chain Rule) : 𝐻 𝑋, 𝑌 = 𝐻 𝑌 𝑋 + 𝐻(𝑋)
Proof :
𝐻 𝑋, 𝑌 = −
𝑥 𝑦
𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑥, 𝑦
= −
𝑥 𝑦
𝑝(𝑥, 𝑦) 𝑙𝑜𝑔(𝑝 𝑦 𝑥 𝑝 𝑥 )
= −
𝑥 𝑦
𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑦 𝑥 −
𝑥 𝑦
𝑝(𝑥, 𝑦) log 𝑝(𝑥)
= 𝐻 𝑌 𝑋 −
𝑥
𝑝(𝑥) log 𝑝(𝑥)
= 𝐻 𝑌 𝑋 + 𝐻(𝑥)
⇒ 𝐻 𝑌 𝑋 = 𝐻 𝑌 𝑋 + 𝐻(𝑋)
Corollary :
𝐻 𝑋, 𝑌, 𝑍 = 𝐻 𝑌 𝑋, 𝑍 + 𝐻 𝑋 𝑍 + 𝐻(𝑍)
Proof Ommit
Entropy, Relative Entropy and Mutual Information
Definition 4 : The Kullback-Leibler divergence or
relative entropy between two probability mass
function 𝑝 𝑥 and 𝑞(𝑥) is defined as
𝐷(𝑝| 𝑞 ≔
𝑥
𝑝(𝑥) log
𝑝(𝑥)
𝑞(𝑥)
= 𝛦𝑝 𝑙𝑜𝑔
𝑝(𝑥)
𝑞(𝑥)
Remark 2 : 𝐷(𝑝||𝑞) ≠ 𝐷(𝑞||𝑝)
Fact : 𝐷(𝑝||𝑞) ≥ 0
Proof :
𝐷(𝑝| 𝑞 =
𝑥
𝑝(𝑥) log
𝑝(𝑥)
𝑞(𝑥)
=
𝑥
𝑝(𝑥)(−1) log
𝑞(𝑥)
𝑝(𝑥)
≥ −1 log
𝑥
𝑝 𝑥
𝑞 𝑥
𝑝 𝑥
= −1 log 1 = 0
By Jensen’s inequality of convex function
Definition 5 : The cross-entropy of p.d.f p and q is defined as
𝐻𝑞 𝑝 ≔ −
𝑥
𝑝(𝑥) log 𝑞(𝑥)
Where 𝑝(𝑥) is unknown, and 𝑞(𝑥) is an approximated p.d.f
Observation :
(1)
𝐷(𝑝| 𝑞 ≔
𝑥
𝑝(𝑥) log
𝑝(𝑥)
𝑞(𝑥)
=
𝑥
𝑝(𝑥) log 𝑝 𝑥 −
𝑥
𝑝 𝑥 log 𝑞(𝑥)
= −𝐻 𝑝 + 𝐻𝑞(𝑝)
(2) 0 ≤ 𝐷(𝑝| 𝑞 = 𝐻𝑞 𝑝 − 𝐻(𝑝)
i.e. 𝐻𝑞(𝑝) ≥ 𝐻(𝑝)
(3) Since 𝑝 𝑥 is fixed,
minimizing 𝑫(𝒑| 𝒒 ⇔ minimizing 𝑯𝒒(𝒑)
Entropy, Relative Entropy and Mutual Information
Definition 6 : The mutual information 𝐼(𝑋; 𝑌) of two r.v’s
𝑋 and 𝑌 is defined as
𝐼 𝑋; 𝑌 ∶=
𝑥 𝑦
𝑝 𝑥, 𝑦 𝑙𝑜𝑔
𝑝(𝑥, 𝑦)
𝑝 𝑥 𝑝(𝑦)
= 𝛦𝑝 𝑙𝑜𝑔
𝑝(𝑥)
𝑞(𝑥)
Observation : 𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌
= 𝐻 𝑌 − 𝐻(𝑌|𝑋)
Proof :
𝐼 𝑋; 𝑌 =
𝑥 𝑦
𝑝 𝑥, 𝑦 𝑙𝑜𝑔
𝑝(𝑥, 𝑦)
𝑝 𝑥 𝑝(𝑦)
=
𝑥 𝑦
𝑝 𝑥, 𝑦 𝑙𝑜𝑔
𝑝 𝑥 𝑦 𝑝(𝑦)
𝑝 𝑥 𝑝(𝑦)
=
𝑥 𝑦
𝑝 𝑥, 𝑦 𝑙𝑜𝑔
𝑝 𝑥 𝑦
𝑝 𝑥
=
𝑥 𝑦
𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑥 𝑦 −
𝑥 𝑦
𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝(𝑥)
= −𝐻 𝑋 𝑌 + 𝐻 𝑋 = −𝐻 𝑌 𝑋 + 𝐻(𝑌)
Remark 3 :
𝐼(𝑋; 𝑌) is the reduction in the uncertainty of 𝑋 due to the
information of 𝑌 (of 𝑌 due to the information of 𝑋)
Proposition :
𝐼(𝑋; 𝑌) ≥ 0 with equality holds ⇔ 𝑋 and 𝑌 are independent.
Proof :
𝐼 𝑋; 𝑌 =
𝑥 𝑦
𝑝 𝑥, 𝑦 𝑙𝑜𝑔
𝑝(𝑥, 𝑦)
𝑝 𝑥 𝑝(𝑦)
≔ 𝐷(𝑝(𝑥, 𝑦)||𝑝 𝑥 𝑝(𝑦) ≥ 0
Corollary : 𝐼 𝑋; 𝑌 𝑍 ≥ 0
Proof : 𝐼 𝑋; 𝑌 𝑍 ≔ 𝐷(𝑝(𝑥, 𝑦|𝑧)||𝑝 𝑥 𝑧 𝑝 𝑦 𝑧 ) ≥ 0
because 𝐷(𝑝||𝑞) ≥ 0
Entropy, Relative Entropy and Mutual Information
Contrastive Learning
Problem : For a given massive data set 𝑋 = {𝑥1, 𝑥2, … , 𝑥𝑇} without labels, how do we learn an encoder 𝑓𝜃(⋅)
(representation) which will be used for the downstream task such as classification or clustering
Idea : For each data point 𝑥 ∈ 𝑋 ,
(1) Randomly draw a positive sample 𝑥+
from 𝑥
(2) Randomly draw 𝑁 − 1 negative samples {𝑥𝑗
−
; 𝑗 = 1,2, … , 𝑁 − 1} from different classes
(3) Choose any type of neural network and learn the encoder 𝑓𝜃(⋅)
(4) Choose a score function 𝑓𝜃 𝑥 𝑇
𝑓𝜃 𝑥+
, 𝑓𝜃 𝑥 𝑇
𝑓𝜃 𝑥𝑗
−
for example
(5) A loss function given 1 positive sample and 𝑁 − 1 negative samples is
𝐿 𝜃 = −Ε𝑥 log
exp 𝑓𝜃
𝑇
𝑥 𝑓𝜃 𝑥+
exp 𝑓𝜃
𝑇
𝑥 𝑓𝜃 𝑥+ + 𝑗=1
𝑁−1
𝑒𝑥𝑝 𝑓𝜃
𝑇
𝑥 𝑓𝜃 𝑥𝑗
−
(6) For a sample {𝑥𝑙}𝑙=1
𝑀
⊂ 𝑋 ≔ {𝑥1, 𝑥2, … , 𝑥𝑇} of batch size M, use the empirical loss function
𝐿𝑀 𝜃 = −
1
𝑀
𝑙=1
𝑀
log
exp 𝑓𝜃
𝑇
𝑥𝑙 𝑓𝜃 𝑥𝑙
+
exp 𝑓𝜃
𝑇
𝑥𝑙 𝑓𝜃 𝑥𝑙
+
+ 𝑗=1
𝑁−1
𝑒𝑥𝑝 𝑓𝜃
𝑇
𝑥𝑙 𝑓𝜃 𝑥𝑗
−
And update 𝜃 ; 𝜃 = 𝑎𝑟𝑔 min
𝜃
𝐿𝑀(𝜃) by gradient descent algorithm
This is the cross-entropy loss
⇒ for a 𝑁 -classes softmax classifier,
i.e. learn to find the positive sample
from the 𝑁 samples
Contrastive Learning
Architecture :
Algorithm :
(0) Input ; batch size 𝑀, network structure
(1) Randomly sample {𝑥𝑙}𝑙=1
𝑀
from 𝑋 ≔ {𝑥𝑡}𝑡=1
𝑇
(2) Randomly initialize all parameters
(3) For each data point 𝑥𝑙, randomly draw one positive sample 𝑥𝑙
+
from 𝑥𝑙 , (𝑁 − 1) negative samples 𝑥𝑗
−
(𝑒)
𝑗=1
𝑁−1
from different classes
(4) Compute the encoder, for 𝑙 = 1,2, … , 𝑀
𝑓𝜃 𝑥𝑙 , 𝑓𝜃 𝑥𝑙
+
, 𝑓𝜃(𝑥𝑗
−
(𝑙))
𝑗=1
𝑁−1
(5) Use the empirical cross-entropy loss,
𝐿𝑀 𝜃 = −
1
𝑀
𝑙=1
𝑀
log
exp 𝑓𝜃
𝑇
𝑥𝑙 𝑓𝜃 𝑥𝑙
+
exp 𝑓𝜃
𝑇
𝑥𝑙 𝑓𝜃 𝑥𝑙
+
+ 𝑗=1
𝑁−1
𝑒𝑥𝑝 𝑓𝜃
𝑇
𝑥𝑙 𝑓𝜃 𝑥𝑗
−
update all parameters by gradient descent algorithm ; 𝜃 = 𝑎𝑟𝑔 min
𝜃
𝐿𝑀(𝜃)
(6) Repeat the step(1)~step(5) until all updated parameters converges or change little within a given tolerance error
⇒ many many epochs
𝑥
𝑥+
𝑥𝑗
−
CNN 𝑓𝜃(⋅) Loss

More Related Content

Similar to 20230213_ComputerVision_연구.pptx

教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
cvpaper. challenge
 
Image Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom ConceptsImage Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom Concepts
mmjalbiaty
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
AmirParnianifard1
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
JAEMINJEONG5
 
Paper Introduction "Density-aware person detection and tracking in crowds"
Paper Introduction "Density-aware person detection and tracking in crowds"Paper Introduction "Density-aware person detection and tracking in crowds"
Paper Introduction "Density-aware person detection and tracking in crowds"
壮 八幡
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
Naoki Hayashi
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
dinesh malla
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Abdulrahman Kerim
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
Shuhei Yoshida
 
TransNeRF
TransNeRFTransNeRF
TransNeRF
NavneetPaul2
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
Masayuki Tanaka
 
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose EstimationHRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
taeseon ryu
 
Fuzzy entropy based optimal
Fuzzy entropy based optimalFuzzy entropy based optimal
Fuzzy entropy based optimal
ijsc
 
Introduction to Image Processing
Introduction to Image ProcessingIntroduction to Image Processing
Introduction to Image Processing
Israel Gbati
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
WooSung Choi
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
Yan Xu
 
Image denoising with unknown Non-Periodic Noises
Image denoising with unknown Non-Periodic NoisesImage denoising with unknown Non-Periodic Noises
Image denoising with unknown Non-Periodic Noises
SakshiAggarwal85
 
Regression
RegressionRegression
Regression
Ncib Lotfi
 
Tutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptxTutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptx
Julián Tachella
 
Behavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithmBehavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithm
ijscmcj
 

Similar to 20230213_ComputerVision_연구.pptx (20)

教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
Image Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom ConceptsImage Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom Concepts
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
Paper Introduction "Density-aware person detection and tracking in crowds"
Paper Introduction "Density-aware person detection and tracking in crowds"Paper Introduction "Density-aware person detection and tracking in crowds"
Paper Introduction "Density-aware person detection and tracking in crowds"
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
 
TransNeRF
TransNeRFTransNeRF
TransNeRF
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
 
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose EstimationHRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
 
Fuzzy entropy based optimal
Fuzzy entropy based optimalFuzzy entropy based optimal
Fuzzy entropy based optimal
 
Introduction to Image Processing
Introduction to Image ProcessingIntroduction to Image Processing
Introduction to Image Processing
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
Image denoising with unknown Non-Periodic Noises
Image denoising with unknown Non-Periodic NoisesImage denoising with unknown Non-Periodic Noises
Image denoising with unknown Non-Periodic Noises
 
Regression
RegressionRegression
Regression
 
Tutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptxTutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptx
 
Behavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithmBehavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithm
 

Recently uploaded

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

20230213_ComputerVision_연구.pptx

  • 2. 2020~2021 Computer Graphics, Computer Vision 2022~2023 Computer Vision 2
  • 3. Image Segmentation 3D Reconstruction Image Restoration Lip Generation 3
  • 4. • YOLACT: Real-time Instance Segmentation Image Segmentation Instance Segmentation 2020.04~2020.08 • MODNet: Trimap-Free Portrait Matting in Real Time • Real-Time High-Resolution Background Matting(BGMv2) Image Segmentation Image Matting 2020.09~2020.12 4
  • 5. • PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization • Expressive Body Capture: 3D Hands, Face, and Body from a Single Image(SMPL eXpressive) • SMPL: A Skinned Multi-Person Linear Model 3D Reconstruction Human Digitalization 2020.06~2020.10 SMPL SMPL X PIFu 5
  • 6. • Unsupervised Real-world Image Super Resolution via Domain-distance Aware Training(DASR) • Designing a Practical Degradation Model for Deep Blind Image Super-Resolution(BSR GAN) • SwinIR: Image Restoration Using Swin Transformer • Towards Robust Blind Face Restoration with Codebook Lookup Transformer(CodeFormer) • Inference를 통해 장단점 분석 Image Restoration Super Resolution 2021.05~2022.11 6
  • 7. • FastSpeech 2: Fast and High-Quality End-to-End Text to Speech • MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement • Face2Face: Real-time Face Capture and Reenactment of RGB Videos Lip Generation Text to Mesh 안녕하세요 Fast Speech2 Visemes Phonemes Phonemes : 음소 Visemes : 음소의 시각적 정의 3D Mesh 2021.10~2022.01 7
  • 8. Lip Generation • ObamaNet: Photo-realistic lip-sync from text • Image-to-Image Translation with Conditional Adversarial Nets(Pix2Pix) Speech to Image 안녕하세요 Char2Wav LSTM Pix2Pix Input Input Output 2022.02~2022.05 8
  • 9. Lip Generation • A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild(Wav2Lip) Speech to Image 2022.06~Now 9
  • 10. Lip Generation • A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild(Wav2Lip) • SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory • Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation Speech to Image 2022.06~Now 10
  • 11. Lip Generation • A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild(Wav2Lip) • SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory • Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation Speech to Image 2022.06~Now 11
  • 12. • Data Parallelization • Model Parallelization DNN의 가장 큰 병목 현상인 학습시간의 단축을 multi-GPU의 활용으로 해결하고자 함 Appendix DNN Parallelization 2018.02~2019.12 12
  • 13. • Hough Transform • Labeling • SIFT • SURF Appendix Classical Method 2016.02~2017.10 13
  • 14. • Multimodal learning • Audio-Image • Lip Generation • Wav2Lip 후속 연구 • Generative model • Stable Diffusion • Audio Representation과 Image Representation을 추출하는 방식을 바꾸어서 생성모델의 품질을 높이는 방향 • 생성모델을 기존의 GAN 방식에서 Diffusion Model과 같은 다른 생성기법을 반영해 생성 품질을 높이는 방향 • 2D Image의 한계를 극복하기 위해 3D 정보를 Estimate해서 더 사실과 가까운 생성물을 생성하는 방향 • Representing Scenes as Neural Radiance Fields for View Synthesis(NeRF) • Learning Transferable Visual Models From Natural Language Supervision(CLIP) • Denoising Diffusion Probabilistic Models(Diffusion Model) • High-Resolution Image Synthesis with Latent Diffusion Models 연구 계획 14
  • 15. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng AI Labs 영상처리파트 오현우주임 2023.01.31
  • 16. 3D Reconstruction vs Volume Rendering • Reality Capture https://www.youtube.com/watch?v=9kIPixG8GHA 3D Reconstruction Volume Rendering 3차원 형태의 샘플링 데이터를 2차원 투시로 보여주는 기술 NeRF 는 이 쪽에 속함
  • 17. 1) March camera rays through the scene to generate a sampled set of 3D points 2) Use those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities 3) Use classical volume rendering techniques to accumulate those colors and densities into a 2D image. NeRF Model - 1
  • 18. 𝐹Θ: 𝑋, 𝑑 → (𝑐, 𝜎) 𝑑 = (𝜃, 𝜙) 𝑋 = (𝑥, 𝑦, 𝑧) 𝑐 = (𝑟, 𝑔, 𝑏) 𝜎 = (𝜎) Ray가 지나가는 곳에 존재하는 vertices의 x,y,z 값 Density(밀도)를 뜻하며 density가 커지면 물체가 불투명해지고(뒤에 있는 것들이 잘 보이지 않음), density가 작아지면 물체가 투명해짐 NeRF Model - 2
  • 20. 𝑋 = (𝑥, 𝑦, 𝑧) POSITIONAL ENCODING 𝑑 = (𝜃, 𝜙) POSITIONAL ENCODING 𝜎 = (𝜎) 𝑐 = (𝑟, 𝑔, 𝑏) 𝑋 = (𝑥, 𝑦, 𝑧) 𝑑 = (𝜃, 𝜙) + 𝑐 = (𝑟, 𝑔, 𝑏) 𝜎 = (𝜎) 𝑋 = (𝑥, 𝑦, 𝑧) NeRF Model - 4
  • 21. Volume Rendering density RGB far near density Density값에 마이너스를 주고 exp를 한 것은 해당 vertices의 위치에서 앞의 density가 클수록 내 weight를 작게 가져가겠다는 의미 대충 𝑡𝑖 Random Sampling 한다는 뜻 Camera Ray를 뜻함 • Model의 Output으로 나온 한 Ray의 Color와 density 값들은 한 pixel로 합쳐지는 Volume Rendering 과정을 거 침 • 합쳐진 pixel rgb값은 실제 이미지의 pixel rgb값과 MSE Loss를 거쳐 Back propagation을 통해 학습이 진행 COARSE MODEL
  • 22. Positional Encoding High frequency 데이터를 얻기 위해서 진행 data augmentation의 일종이라고 보면 됨 𝑋 = (𝑥, 𝑦, 𝑧) POSITIONAL ENCODING 𝑑 = (𝜃, 𝜙) POSITIONAL ENCODING 𝜎 = (𝜎) 𝑐 = (𝑟, 𝑔, 𝑏) 3차원을 60차원으로 3차원을 24차원으로
  • 23. Hierarchical Volume Sampling Weight를 Normalize 하고 Sampling을 더 많이 해서 COARSE MODEL과는 다른 FINE MODEL을 구축
  • 24. Loss Function COARSE MODEL과 FINE MODEL을 통해 Volume Rendering한 RGB값을 원본 RGB값과 Error 계산
  • 25. NeRF 단점 1. 느린 속도 NeRF는 Training 및 Rendering 속도가 느림 NeRF 모델 하나당 한 물체를 표현할 수 있음 학습을 (200k~300k 기준) 한 번 돌리는데 대략 1~2일 소요 -> DeRF(21CVPR) , NeRF++, plenoxel(22CVPR) 2. NeRF는 Static한 Scene에서만 성능이 좋음 움직이는 물체가 있는 Scene에 대해서 많은 Noise를 생성 -> D-NeRF(21CVPR) , Nerfies(21ICCV) , HyperNeRF
  • 26. NeRF 단점 3. NeRF는 같은 환경에서 촬영한 이미지에 대해서만 성능이 나옴 Static 한 물체더라도 날씨, 시간등에 따라 명암, 색상 빛 조건 등이 다를 수 있고, Real world에는 사실 스튜디오에서 찍는게 아닌 이상 이런 데이터들이 더 많음 -> NeRV(21CVPR) , NeRD(21CVPR) , NeRF in the wild (21CVPR) 4. NeRF는 general한 Model이 아님 NeRF는 한 Model로 하나의 물체만 만들어낼 수 있음 -> GIRAFFE(21CVPR) , pixel-NeRF(22CVPR) ...
  • 27. NeRF 단점 5. 너무 다양한 시점의 training set이 필요 NeRF에 input으로 들어가는 synthetic Training dataset은 100개 한 물체를 학습 하기 위해 100장의 사진을 찍는 것은 inefficient 몇 장의 사진만으로 물체를 렌더링 하는 연구 필요 -> pixel-NeRF(22CVPR) , DietNeRF(21ICCV), Instant-NGP 6. NeRF의 Input인 Camera Parameter NeRF에서는 카메라의 위치를 알기 위한 intrinsic parameter와 extrinsic parameter 값이 필요 일반인이 스마트폰 카메라 등으로 물체를 촬영하여 학습을 하기에는 너무 많은 정보 이를 해결하기 위해 pose를 estimate, 혹은 pose 자체를 학습하는 등의 연구가 진행 -> iNeRF(21IROS) , NeRF-- , GNeRF(21ICCV) , BARF(21ICCV) , SCNeRF(21ICCV)
  • 29. Entropy, Relative Entropy and Mutual Information* *Elements of Information Theory Thomas M. Cover, Joy A. Thomas Definition 1 : The entropy of a discrete random variable with p.d.f 𝑝(𝑥) is defined as, 𝐻 𝑥 : = − 𝑥 𝑝(𝑥) 𝑙𝑜𝑔 𝑝(𝑥) Convention : 0 log 0 = 0 Remark 1 : The entropy 𝐻 𝑥 is a measure of the average uncertainty of random variable 𝑋. We can write also 𝐻 𝑥 = − 𝑥 𝑝(𝑥) 𝑙𝑜𝑔 𝑝(𝑥) = − 𝑥 𝑝(𝑥) 𝑙𝑜𝑔 𝑝( 1 𝑝(𝑥) ) Lemma 1 : H(x) ≥ 0 Proof : By definition, 𝐻(𝑥) = − 𝑥 𝑝(𝑥) 𝑙𝑜𝑔 𝑝(𝑥) = − 𝑥 𝑝(𝑥)(− 𝑙𝑜𝑔 𝑝(𝑥)) ≥ 0 Definition 2 : The joint entropy 𝐻(𝑋, 𝑌) of a pair of discrete random variable (𝑋, 𝑌) with joint p.d.f 𝑝(𝑥, 𝑦) is defined as, 𝐻 𝑥, y = − x y 𝑝(𝑥, y) 𝑙𝑜𝑔 𝑝(𝑥, y) Clearly 𝐻(𝑋, 𝑌) ≥ 0
  • 30. Definition 3 : If (𝑋, 𝑌) ~ 𝑝(𝑥, 𝑦), then the conditional entropy 𝐻(𝑌|𝑋) is defined as, 𝐻 𝑌 𝑋 ≔ 𝑥 𝑝 𝑥 𝐻 𝑌 𝑋 = 𝑥 = 𝑥 𝑝(𝑥)(−1) 𝑦 𝑝(𝑦, 𝑥) 𝑙𝑜𝑔 𝑝(𝑦|𝑥) = − 𝑥 𝑦 𝑝 𝑦 𝑥 𝑝(𝑥) 𝑙𝑜𝑔 𝑝(𝑦|𝑥) = − 𝑥 𝑦 𝑝(𝑥, 𝑦) 𝑙𝑜𝑔 𝑝(𝑦|𝑥) = −𝛦𝑝 𝑥,𝑦 (𝑙𝑜𝑔 𝑝 𝑌 𝑋 ) Theorem(chain Rule) : 𝐻 𝑋, 𝑌 = 𝐻 𝑌 𝑋 + 𝐻(𝑋) Proof : 𝐻 𝑋, 𝑌 = − 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑥, 𝑦 = − 𝑥 𝑦 𝑝(𝑥, 𝑦) 𝑙𝑜𝑔(𝑝 𝑦 𝑥 𝑝 𝑥 ) = − 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑦 𝑥 − 𝑥 𝑦 𝑝(𝑥, 𝑦) log 𝑝(𝑥) = 𝐻 𝑌 𝑋 − 𝑥 𝑝(𝑥) log 𝑝(𝑥) = 𝐻 𝑌 𝑋 + 𝐻(𝑥) ⇒ 𝐻 𝑌 𝑋 = 𝐻 𝑌 𝑋 + 𝐻(𝑋) Corollary : 𝐻 𝑋, 𝑌, 𝑍 = 𝐻 𝑌 𝑋, 𝑍 + 𝐻 𝑋 𝑍 + 𝐻(𝑍) Proof Ommit Entropy, Relative Entropy and Mutual Information
  • 31. Definition 4 : The Kullback-Leibler divergence or relative entropy between two probability mass function 𝑝 𝑥 and 𝑞(𝑥) is defined as 𝐷(𝑝| 𝑞 ≔ 𝑥 𝑝(𝑥) log 𝑝(𝑥) 𝑞(𝑥) = 𝛦𝑝 𝑙𝑜𝑔 𝑝(𝑥) 𝑞(𝑥) Remark 2 : 𝐷(𝑝||𝑞) ≠ 𝐷(𝑞||𝑝) Fact : 𝐷(𝑝||𝑞) ≥ 0 Proof : 𝐷(𝑝| 𝑞 = 𝑥 𝑝(𝑥) log 𝑝(𝑥) 𝑞(𝑥) = 𝑥 𝑝(𝑥)(−1) log 𝑞(𝑥) 𝑝(𝑥) ≥ −1 log 𝑥 𝑝 𝑥 𝑞 𝑥 𝑝 𝑥 = −1 log 1 = 0 By Jensen’s inequality of convex function Definition 5 : The cross-entropy of p.d.f p and q is defined as 𝐻𝑞 𝑝 ≔ − 𝑥 𝑝(𝑥) log 𝑞(𝑥) Where 𝑝(𝑥) is unknown, and 𝑞(𝑥) is an approximated p.d.f Observation : (1) 𝐷(𝑝| 𝑞 ≔ 𝑥 𝑝(𝑥) log 𝑝(𝑥) 𝑞(𝑥) = 𝑥 𝑝(𝑥) log 𝑝 𝑥 − 𝑥 𝑝 𝑥 log 𝑞(𝑥) = −𝐻 𝑝 + 𝐻𝑞(𝑝) (2) 0 ≤ 𝐷(𝑝| 𝑞 = 𝐻𝑞 𝑝 − 𝐻(𝑝) i.e. 𝐻𝑞(𝑝) ≥ 𝐻(𝑝) (3) Since 𝑝 𝑥 is fixed, minimizing 𝑫(𝒑| 𝒒 ⇔ minimizing 𝑯𝒒(𝒑) Entropy, Relative Entropy and Mutual Information
  • 32. Definition 6 : The mutual information 𝐼(𝑋; 𝑌) of two r.v’s 𝑋 and 𝑌 is defined as 𝐼 𝑋; 𝑌 ∶= 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝(𝑥, 𝑦) 𝑝 𝑥 𝑝(𝑦) = 𝛦𝑝 𝑙𝑜𝑔 𝑝(𝑥) 𝑞(𝑥) Observation : 𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 = 𝐻 𝑌 − 𝐻(𝑌|𝑋) Proof : 𝐼 𝑋; 𝑌 = 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝(𝑥, 𝑦) 𝑝 𝑥 𝑝(𝑦) = 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑥 𝑦 𝑝(𝑦) 𝑝 𝑥 𝑝(𝑦) = 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑥 𝑦 𝑝 𝑥 = 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑥 𝑦 − 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝(𝑥) = −𝐻 𝑋 𝑌 + 𝐻 𝑋 = −𝐻 𝑌 𝑋 + 𝐻(𝑌) Remark 3 : 𝐼(𝑋; 𝑌) is the reduction in the uncertainty of 𝑋 due to the information of 𝑌 (of 𝑌 due to the information of 𝑋) Proposition : 𝐼(𝑋; 𝑌) ≥ 0 with equality holds ⇔ 𝑋 and 𝑌 are independent. Proof : 𝐼 𝑋; 𝑌 = 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝(𝑥, 𝑦) 𝑝 𝑥 𝑝(𝑦) ≔ 𝐷(𝑝(𝑥, 𝑦)||𝑝 𝑥 𝑝(𝑦) ≥ 0 Corollary : 𝐼 𝑋; 𝑌 𝑍 ≥ 0 Proof : 𝐼 𝑋; 𝑌 𝑍 ≔ 𝐷(𝑝(𝑥, 𝑦|𝑧)||𝑝 𝑥 𝑧 𝑝 𝑦 𝑧 ) ≥ 0 because 𝐷(𝑝||𝑞) ≥ 0 Entropy, Relative Entropy and Mutual Information
  • 33. Contrastive Learning Problem : For a given massive data set 𝑋 = {𝑥1, 𝑥2, … , 𝑥𝑇} without labels, how do we learn an encoder 𝑓𝜃(⋅) (representation) which will be used for the downstream task such as classification or clustering Idea : For each data point 𝑥 ∈ 𝑋 , (1) Randomly draw a positive sample 𝑥+ from 𝑥 (2) Randomly draw 𝑁 − 1 negative samples {𝑥𝑗 − ; 𝑗 = 1,2, … , 𝑁 − 1} from different classes (3) Choose any type of neural network and learn the encoder 𝑓𝜃(⋅) (4) Choose a score function 𝑓𝜃 𝑥 𝑇 𝑓𝜃 𝑥+ , 𝑓𝜃 𝑥 𝑇 𝑓𝜃 𝑥𝑗 − for example (5) A loss function given 1 positive sample and 𝑁 − 1 negative samples is 𝐿 𝜃 = −Ε𝑥 log exp 𝑓𝜃 𝑇 𝑥 𝑓𝜃 𝑥+ exp 𝑓𝜃 𝑇 𝑥 𝑓𝜃 𝑥+ + 𝑗=1 𝑁−1 𝑒𝑥𝑝 𝑓𝜃 𝑇 𝑥 𝑓𝜃 𝑥𝑗 − (6) For a sample {𝑥𝑙}𝑙=1 𝑀 ⊂ 𝑋 ≔ {𝑥1, 𝑥2, … , 𝑥𝑇} of batch size M, use the empirical loss function 𝐿𝑀 𝜃 = − 1 𝑀 𝑙=1 𝑀 log exp 𝑓𝜃 𝑇 𝑥𝑙 𝑓𝜃 𝑥𝑙 + exp 𝑓𝜃 𝑇 𝑥𝑙 𝑓𝜃 𝑥𝑙 + + 𝑗=1 𝑁−1 𝑒𝑥𝑝 𝑓𝜃 𝑇 𝑥𝑙 𝑓𝜃 𝑥𝑗 − And update 𝜃 ; 𝜃 = 𝑎𝑟𝑔 min 𝜃 𝐿𝑀(𝜃) by gradient descent algorithm This is the cross-entropy loss ⇒ for a 𝑁 -classes softmax classifier, i.e. learn to find the positive sample from the 𝑁 samples
  • 34. Contrastive Learning Architecture : Algorithm : (0) Input ; batch size 𝑀, network structure (1) Randomly sample {𝑥𝑙}𝑙=1 𝑀 from 𝑋 ≔ {𝑥𝑡}𝑡=1 𝑇 (2) Randomly initialize all parameters (3) For each data point 𝑥𝑙, randomly draw one positive sample 𝑥𝑙 + from 𝑥𝑙 , (𝑁 − 1) negative samples 𝑥𝑗 − (𝑒) 𝑗=1 𝑁−1 from different classes (4) Compute the encoder, for 𝑙 = 1,2, … , 𝑀 𝑓𝜃 𝑥𝑙 , 𝑓𝜃 𝑥𝑙 + , 𝑓𝜃(𝑥𝑗 − (𝑙)) 𝑗=1 𝑁−1 (5) Use the empirical cross-entropy loss, 𝐿𝑀 𝜃 = − 1 𝑀 𝑙=1 𝑀 log exp 𝑓𝜃 𝑇 𝑥𝑙 𝑓𝜃 𝑥𝑙 + exp 𝑓𝜃 𝑇 𝑥𝑙 𝑓𝜃 𝑥𝑙 + + 𝑗=1 𝑁−1 𝑒𝑥𝑝 𝑓𝜃 𝑇 𝑥𝑙 𝑓𝜃 𝑥𝑗 − update all parameters by gradient descent algorithm ; 𝜃 = 𝑎𝑟𝑔 min 𝜃 𝐿𝑀(𝜃) (6) Repeat the step(1)~step(5) until all updated parameters converges or change little within a given tolerance error ⇒ many many epochs 𝑥 𝑥+ 𝑥𝑗 − CNN 𝑓𝜃(⋅) Loss