SlideShare a Scribd company logo
Susang Kim(healess1@gmail.com)
Video Understanding(2)
Learning Video Representations from Correspondence Proposals
Video Architecture (ReCap)
ImageNet Pre-trained Model backbone을 활용하여 기존 아키텍쳐(a~d)와
논문에서 제시한 I3D(e)와의 비교를 통해 네트워크 구조 변경을 통한 성능 개선을 제시
Kinetics Dataset (ReCap)
ImageNet(1000장/1000카테고리) 으로 학습한 Pre-trained 모델을 활용하면 Classification뿐만 아니라
Object Detection/Segmentation등에서도 좋은 성능이 나온 것을 착안하여 만든 Dataset으로
Action Recognition에서 Kinetics Dataset으로 학습한 Pre-trained 모델로 기존에 활용되던
HMDB-51과 UCF-101를 활용하여 fine-tuning를 통해 SOTA를 달성함으로써 대량의 학습데이터
필요성의 중요함을 증명해냄.
Kinetics Dataset : 650,000개의 비디오에
행동중심으로 (단독행동, 사람간 행동,
물건을 다루는 행동)이 정의됨
(클래스당 600 비디오 클립으로 10초씩)
처음 400클래스 공개 후 600, 700개로
추가한 클래스로 정의된 Dataset이 있음
HMDB-51 Dataset (ReCap)
ICCV 2011에서 공개된 Human Motion에 관한 6849개의 비디오 클립에 51개의 액션 카테고리로 정의 각
카테고리는 101개의 클립으로 구성
http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/#introduction
Action Recognition 두번째 논문
CVPR 2019에서 Stanford/Adobe에서
발표한 논문으로 RGB만을 활용한
2D이미지와 Temporal한 정보들 간의
Semantic Feature의 연관성을 구성하는
CPNet(Correspondence Proposals)을
통해 SOTA를 달성
Action Recognition : 특정 비디오
영상에서 사람이 어떤 행동을 하는지를
위한 Classification을 하는 것 (비디오
영상을 입력하여 예측 결과 출력)
이미지(Frame)의 특정 객체간에는 연관성을 가지고 있음
https://www.youtube.com/watch?v=4IInDT_S0ow&t=57s
Correspondence in Videos
Visualization of CP Module
프레임의 변화에 따른 특정 피쳐간의 연광성을 시각화해서 표현
https://www.youtube.com/watch?v=4IInDT_S0ow&t=57s
기존 연구에서의 문제점
We constructed a toy video dataset where
previous RGB only methods fail in
learning long-range motion. Through this
extremely simple dataset, we show the
drawbacks of previous methods and the
advantage of our architecture
일반적으로 Low frame rate와 fast motion의
Action Video의 경우 인식률이 낮지만
CPNet을 통해 개선함
(Up, Down, Left, Right)
32x32의 검정색 배경 위에 2x2의 흰색 점을
이동 (7에서 9pixel 정도로 이동)
Train 1000 / Validation 200
A Failing of Several Previous Methods
Correspondence Proposals Module
https://arxiv.org/pdf/1905.07853.pdf
보라색 점(Feature)를
중심으로 가장 유사한 특징을
가지는 Feature를 CP
Module를 통해 K개 만큼
(k-NN) 의미 있는 Feature를
뽑아 보다 복수프레임에서의
정확한 클래스를 구분해 낼 수
있음
Calculate the similarity of all features point pairs
각각의 THW의 채널(C) 별 Feature 값을 쌍으로 구성하여 Negative(-) euclidean distance metric(k-NN)을
통해 각각의 similarity score를 구성함 (T:시간, H:높이, W:길이, C:채널)
- -
Set the diagonal block matrices
similarity score를 통해 각각을 matrix(THW x THW)로 구성 (동일 T상에서의 Frame(H x W)은 −∞로 제외함
각각의 THW별 similarity score 값이 높은 k개의 feature를 구성함 (Look up table로 되어있음)
따라서 각각의 row는 THW x k개의 matrix로 구성됨 각 feature는 i값의 index를 가지고 있음
Correspondence Embedding layer
𝑖₀와 가장 의미있는 feature를 k개만큼 찾아내어 해당
index 값과 THW feature를 MLP로 학습 시켜 의
백터(CP Vector) 값을 구해냄
CP Modules Codes
nn_idx = knn.knn(net, k, new_height * new_width)
net_expand = tf.tile(tf.expand_dims(net, axis=2), [1,1,k,1])
net_grouped = tf_grouping.group_point(net, nn_idx)
coord = get_coord(tf.reshape(video, [batch_size, -1, new_height, new_width,
num_channels_bottleneck]))
coord_expand = tf.tile(tf.expand_dims(coord, axis=2), [1,1,k,1])
coord_grouped = tf_grouping.group_point(coord, nn_idx)
coord_diff = coord_grouped - coord_expand
end_points['coord'] = {'coord': coord, 'coord_grouped': coord_grouped, 'coord_diff':
coord_diff}
net = tf.concat([coord_diff, net_expand, net_grouped], axis=-1)
with tf.variable_scope(scope) as sc:
for i, num_out_channel in enumerate(mlp):
net = tf_util.conv2d(net, num_out_channel, [1,1], padding='VALID',
stride=[1,1], bn=True, is_training=is_training,
scope='conv%d'%(i), bn_decay=bn_decay, weight_decay=weight_decay,
data_format=data_format, freeze_bn=freeze_bn)
end_points['before_max'] = net
net = tf.reduce_max(net, axis=[2], keepdims=True, name='maxpool')
end_points['after_max'] = net
net = tf.reshape(net, [batch_size, num_frames, new_height, new_width, lp[-1]])
with tf.variable_scope(scope) as sc:
net = tf_util.conv3d(net, num_channels, [1, 1, 1], stride=[1, 1, 1],
bn=False, activation_fn=None, weight_decay=weight_decay, scope='conv_final')
net = tf.contrib.layers.batch_norm(net, center=True, scale=True,
is_training=is_training if not freeze_bn else tf.constant(False,
shape=(), dtype=tf.bool), decay=bn_decay, updates_collections=None,
scope='bn_final', data_format=data_format, param_initializers={'gamma':
tf.constant_initializer(0., dtype=tf.float32)}, trainable=not freeze_bn)
return net, end_points
def cp_module(video, k, mlp, scope, mlp0=None, is_training=None, bn_decay=None, weight_decay=None, data_format='NHWC',
distance='l2', activation_fn=None, shrink_ratio=None, freeze_bn=False):
CPNet Architecture in ResNet
ResNet 101에 CP Modules을 적용할 것 처럼 CP Module은
residual block의 마지막 CNN Layer(ReLU앞)에 붙여 사용할 수 있음
Comparison with Other Architectures
Optical flow를 사용하는 I3D와 같은 Two Stream Network보다 RGB만으로도 더 좋은 성능을
보여주고 기존 RGB기반 Network보다 적은 파라미터를 사용
Model Run Time
Spatial size 112 x 112, ResNet-34, NVIDIA GTX 1080 Ti GPU with Tensorflow and cuDNN
배치 Size가 1일때 processsing speed of 10.1 videos/s for frame length of 8 and 3.9 videos/s for
frame length of 32. The number of videos that can be proces
시간복잡도를 수식으로 계산하면 O((THW) log(THW) · (C + k))
Batch 사이즈가 증가함에 따라 시간도 증가함
Visualization of CP Module
Thanks
Any Questions?
You can send mail to
Susang Kim(healess1@gmail.com)

More Related Content

What's hot

Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
ArchiLab 7
 
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer SystemsMapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
Mikhail Kurnosov
 

What's hot (20)

[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Vgg
VggVgg
Vgg
 
Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Clustering lect
Clustering lectClustering lect
Clustering lect
 
crfasrnn_presentation
crfasrnn_presentationcrfasrnn_presentation
crfasrnn_presentation
 
Fuzzy image processing- fuzzy C-mean clustering
Fuzzy image processing- fuzzy C-mean clusteringFuzzy image processing- fuzzy C-mean clustering
Fuzzy image processing- fuzzy C-mean clustering
 
Image Restoration 2 (Digital Image Processing)
Image Restoration 2 (Digital Image Processing)Image Restoration 2 (Digital Image Processing)
Image Restoration 2 (Digital Image Processing)
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Anatomy of a Texture Fetch
Anatomy of a Texture FetchAnatomy of a Texture Fetch
Anatomy of a Texture Fetch
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
Performance Improvement of Vector Quantization with Bit-parallelism HardwarePerformance Improvement of Vector Quantization with Bit-parallelism Hardware
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
 
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer SystemsMapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
 
DCT image compression
DCT image compressionDCT image compression
DCT image compression
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...
 

Similar to [Paper] learning video representations from correspondence proposals

AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdf
ssuserb4d806
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 

Similar to [Paper] learning video representations from correspondence proposals (20)

AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdf
 
maXbox starter65 machinelearning3
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3
 
Deep cv 101
Deep cv 101Deep cv 101
Deep cv 101
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
[신경망기초] 합성곱신경망
[신경망기초] 합성곱신경망[신경망기초] 합성곱신경망
[신경망기초] 합성곱신경망
 
CNN_INTRO.pptx
CNN_INTRO.pptxCNN_INTRO.pptx
CNN_INTRO.pptx
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117
 
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 
Training course lect3
Training course lect3Training course lect3
Training course lect3
 
Hand gestures recognition seminar_ppt.pptx.pdf
Hand gestures recognition seminar_ppt.pptx.pdfHand gestures recognition seminar_ppt.pptx.pdf
Hand gestures recognition seminar_ppt.pptx.pdf
 
K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
med_poster_spie
med_poster_spiemed_poster_spie
med_poster_spie
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 

More from Susang Kim

More from Susang Kim (16)

[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
 
[Paper] dynamic routing between capsules
[Paper] dynamic routing between capsules[Paper] dynamic routing between capsules
[Paper] dynamic routing between capsules
 
[Paper] anti spoofing for face recognition
[Paper] anti spoofing for face recognition[Paper] anti spoofing for face recognition
[Paper] anti spoofing for face recognition
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)
 
[Paper] shuffle net an extremely efficient convolutional neural network for ...
[Paper] shuffle net  an extremely efficient convolutional neural network for ...[Paper] shuffle net  an extremely efficient convolutional neural network for ...
[Paper] shuffle net an extremely efficient convolutional neural network for ...
 
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
 
[Paper] auto ml part 1
[Paper] auto ml part 1[Paper] auto ml part 1
[Paper] auto ml part 1
 
[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer vision[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer vision
 
[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object Detection[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object Detection
 
Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)
 
I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)
 
GroupFace (Face Recognition)
GroupFace (Face Recognition)GroupFace (Face Recognition)
GroupFace (Face Recognition)
 
제11회공개sw개발자대회 금상 TensorMSA(소개)
제11회공개sw개발자대회 금상 TensorMSA(소개)제11회공개sw개발자대회 금상 TensorMSA(소개)
제11회공개sw개발자대회 금상 TensorMSA(소개)
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture note
 
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
 

Recently uploaded

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 

Recently uploaded (20)

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 

[Paper] learning video representations from correspondence proposals

  • 1. Susang Kim(healess1@gmail.com) Video Understanding(2) Learning Video Representations from Correspondence Proposals
  • 2. Video Architecture (ReCap) ImageNet Pre-trained Model backbone을 활용하여 기존 아키텍쳐(a~d)와 논문에서 제시한 I3D(e)와의 비교를 통해 네트워크 구조 변경을 통한 성능 개선을 제시
  • 3. Kinetics Dataset (ReCap) ImageNet(1000장/1000카테고리) 으로 학습한 Pre-trained 모델을 활용하면 Classification뿐만 아니라 Object Detection/Segmentation등에서도 좋은 성능이 나온 것을 착안하여 만든 Dataset으로 Action Recognition에서 Kinetics Dataset으로 학습한 Pre-trained 모델로 기존에 활용되던 HMDB-51과 UCF-101를 활용하여 fine-tuning를 통해 SOTA를 달성함으로써 대량의 학습데이터 필요성의 중요함을 증명해냄. Kinetics Dataset : 650,000개의 비디오에 행동중심으로 (단독행동, 사람간 행동, 물건을 다루는 행동)이 정의됨 (클래스당 600 비디오 클립으로 10초씩) 처음 400클래스 공개 후 600, 700개로 추가한 클래스로 정의된 Dataset이 있음
  • 4. HMDB-51 Dataset (ReCap) ICCV 2011에서 공개된 Human Motion에 관한 6849개의 비디오 클립에 51개의 액션 카테고리로 정의 각 카테고리는 101개의 클립으로 구성 http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/#introduction
  • 5. Action Recognition 두번째 논문 CVPR 2019에서 Stanford/Adobe에서 발표한 논문으로 RGB만을 활용한 2D이미지와 Temporal한 정보들 간의 Semantic Feature의 연관성을 구성하는 CPNet(Correspondence Proposals)을 통해 SOTA를 달성 Action Recognition : 특정 비디오 영상에서 사람이 어떤 행동을 하는지를 위한 Classification을 하는 것 (비디오 영상을 입력하여 예측 결과 출력)
  • 6. 이미지(Frame)의 특정 객체간에는 연관성을 가지고 있음 https://www.youtube.com/watch?v=4IInDT_S0ow&t=57s Correspondence in Videos
  • 7. Visualization of CP Module 프레임의 변화에 따른 특정 피쳐간의 연광성을 시각화해서 표현 https://www.youtube.com/watch?v=4IInDT_S0ow&t=57s
  • 8. 기존 연구에서의 문제점 We constructed a toy video dataset where previous RGB only methods fail in learning long-range motion. Through this extremely simple dataset, we show the drawbacks of previous methods and the advantage of our architecture 일반적으로 Low frame rate와 fast motion의 Action Video의 경우 인식률이 낮지만 CPNet을 통해 개선함 (Up, Down, Left, Right) 32x32의 검정색 배경 위에 2x2의 흰색 점을 이동 (7에서 9pixel 정도로 이동) Train 1000 / Validation 200 A Failing of Several Previous Methods
  • 9. Correspondence Proposals Module https://arxiv.org/pdf/1905.07853.pdf 보라색 점(Feature)를 중심으로 가장 유사한 특징을 가지는 Feature를 CP Module를 통해 K개 만큼 (k-NN) 의미 있는 Feature를 뽑아 보다 복수프레임에서의 정확한 클래스를 구분해 낼 수 있음
  • 10. Calculate the similarity of all features point pairs 각각의 THW의 채널(C) 별 Feature 값을 쌍으로 구성하여 Negative(-) euclidean distance metric(k-NN)을 통해 각각의 similarity score를 구성함 (T:시간, H:높이, W:길이, C:채널) - -
  • 11. Set the diagonal block matrices similarity score를 통해 각각을 matrix(THW x THW)로 구성 (동일 T상에서의 Frame(H x W)은 −∞로 제외함 각각의 THW별 similarity score 값이 높은 k개의 feature를 구성함 (Look up table로 되어있음) 따라서 각각의 row는 THW x k개의 matrix로 구성됨 각 feature는 i값의 index를 가지고 있음
  • 12. Correspondence Embedding layer 𝑖₀와 가장 의미있는 feature를 k개만큼 찾아내어 해당 index 값과 THW feature를 MLP로 학습 시켜 의 백터(CP Vector) 값을 구해냄
  • 13. CP Modules Codes nn_idx = knn.knn(net, k, new_height * new_width) net_expand = tf.tile(tf.expand_dims(net, axis=2), [1,1,k,1]) net_grouped = tf_grouping.group_point(net, nn_idx) coord = get_coord(tf.reshape(video, [batch_size, -1, new_height, new_width, num_channels_bottleneck])) coord_expand = tf.tile(tf.expand_dims(coord, axis=2), [1,1,k,1]) coord_grouped = tf_grouping.group_point(coord, nn_idx) coord_diff = coord_grouped - coord_expand end_points['coord'] = {'coord': coord, 'coord_grouped': coord_grouped, 'coord_diff': coord_diff} net = tf.concat([coord_diff, net_expand, net_grouped], axis=-1) with tf.variable_scope(scope) as sc: for i, num_out_channel in enumerate(mlp): net = tf_util.conv2d(net, num_out_channel, [1,1], padding='VALID', stride=[1,1], bn=True, is_training=is_training, scope='conv%d'%(i), bn_decay=bn_decay, weight_decay=weight_decay, data_format=data_format, freeze_bn=freeze_bn) end_points['before_max'] = net net = tf.reduce_max(net, axis=[2], keepdims=True, name='maxpool') end_points['after_max'] = net net = tf.reshape(net, [batch_size, num_frames, new_height, new_width, lp[-1]]) with tf.variable_scope(scope) as sc: net = tf_util.conv3d(net, num_channels, [1, 1, 1], stride=[1, 1, 1], bn=False, activation_fn=None, weight_decay=weight_decay, scope='conv_final') net = tf.contrib.layers.batch_norm(net, center=True, scale=True, is_training=is_training if not freeze_bn else tf.constant(False, shape=(), dtype=tf.bool), decay=bn_decay, updates_collections=None, scope='bn_final', data_format=data_format, param_initializers={'gamma': tf.constant_initializer(0., dtype=tf.float32)}, trainable=not freeze_bn) return net, end_points def cp_module(video, k, mlp, scope, mlp0=None, is_training=None, bn_decay=None, weight_decay=None, data_format='NHWC', distance='l2', activation_fn=None, shrink_ratio=None, freeze_bn=False):
  • 14. CPNet Architecture in ResNet ResNet 101에 CP Modules을 적용할 것 처럼 CP Module은 residual block의 마지막 CNN Layer(ReLU앞)에 붙여 사용할 수 있음
  • 15. Comparison with Other Architectures Optical flow를 사용하는 I3D와 같은 Two Stream Network보다 RGB만으로도 더 좋은 성능을 보여주고 기존 RGB기반 Network보다 적은 파라미터를 사용
  • 16. Model Run Time Spatial size 112 x 112, ResNet-34, NVIDIA GTX 1080 Ti GPU with Tensorflow and cuDNN 배치 Size가 1일때 processsing speed of 10.1 videos/s for frame length of 8 and 3.9 videos/s for frame length of 32. The number of videos that can be proces 시간복잡도를 수식으로 계산하면 O((THW) log(THW) · (C + k)) Batch 사이즈가 증가함에 따라 시간도 증가함
  • 18. Thanks Any Questions? You can send mail to Susang Kim(healess1@gmail.com)