SlideShare a Scribd company logo
Tracking Emerges by Colorizing Videos
Vondrick, C., Shrivastava, A., Fathi, A., Guadarrama, S., & Murphy, K. (2018). arXiv preprint arXiv:1806.09594.
발표자 : 오유진
Key Point
✓ Visual tracking of objects naturally by converting black-and-white images into color
[Abstract]
• Teaching a machine to visually track objects is challenging
– It requires large, labeled tracking datasets for training, which are impractical to annotate at scale
– Hard to prepare image datasets
• Suggest how to color the grayscale video by copying colors from a reference frame
• Network automatically tracks objects without supervision
academic dataset DAVIS 2017
Related Work
• Self-supervised Learning
– Training visual models without human supervision
– Training labels are decided by input data
– Typically, network uses a piece of data and predict the rest
• Tracking without label
– Self-supervised learning problem that causes the model to automatically learn tracking on its own
– Using the same trained model to tracking and colorizing without fine-tuning or re-training
• Colorization
– Colorizing gray-scale images has been the subject of significant study in the computer vision community
– Use video colorization as a proxy task for learning to track
Self-supervised tracking; Model
✓ Convert all frames except the first frame to grays-scale and learn the convolutional network to predict the original color
• When a Gray-scale frame is given, this model calculates low-dimensional embedding for each location
• Points from the target frame into the reference frame embeddings(solid yellow arrow)
• Copies the color back into the predicted frame (dashed yellow arrow)
• After learning, use the pointing mechanism as a visual tracker
Self-supervised tracking; Model
• 𝑐𝑖 ∈ ℝ 𝑑
is the true color for pixel 𝑖 in the reference frame
• 𝑐𝑖 ∈ ℝ 𝑑
is the true color for pixel 𝑗 in the target frame
• 𝑦𝑗 ∈ ℝ 𝑑
is model’s prediction for 𝑐𝑖
• Predicts 𝑦𝑗 as a linear combination of colors in the reference frame → 𝑦𝑗 = σ𝑖 𝐴𝑖𝑗 𝑐𝑖
• A is a similarity matrix between target frame and reference frame
𝐴𝑖𝑗 =
exp(𝑓𝑖
𝑇
𝑓𝑗)
σ 𝑘 exp(𝑓𝑘
𝑇
𝑓𝑗)
• 𝑓𝑖 ∈ ℝ 𝐷
is a low-dimension embedding for pixel 𝑖 that is estimated by a CNN
• If there are two objects with the same color, the model does not constrain them to have the same embedding
video from the DAVIS 2017 dataset
Self-supervised tracking; Learning
• The assumption during training that color is generally temporally stable
• Visualize frames one second apart from the Kinetics training set
– The first row shows the original frames
– The second row shows the ab color channels from Lab space
– The third row quantizes the color space into discrete bins and perturbs the colors to make the effect more pronounced → Using k-means to
clustering color channel
• loss function : min
𝜃
σ 𝑗 ℒ(𝑦𝑗, 𝑐𝑗)
– Train the parameters of the model θ such that the predicted colors 𝑦𝑗 are close to the target colors 𝑐𝑗 across the training set
Self-supervised tracking; Learning
• Learning to copy colors from the single reference frame requires the model to learn to internally point to the right region in order to
copy the right colors
• learn an explicit mechanism that we can use for tracking
InputReference Frame Predicted Colors
Examples of predicted colors from colorized reference frame applied to input video using the publicly-available Kinetics dataset
Implementation Details
• Use a 3D convolutional network to produce 64-dimensional embeddings
• The network predicts a down-sampled feature map of 32 × 32 for each of the input frames
– On each input frame uses ResNet-18 network architecture, Use five 3D convolutional network layer
– To give the features global spatial information, we encode the spatial location as a two-dimensional vector in the range [−1, 1] and
concatenate this to the features between the ResNet-18 and the 3D convolutional network
• Model input : 256 × 256 down-sampled four gray-scale video frame
• First three frame are used as reference frame fourth frame is used as target frame
• 400, 000 iterations, 32 batch size, Adam optimizer
– learning rate of 0.001 for the first 60, 000 iterations and reduce it to 0.0001 afterwards
– The model is randomly initialized with Gaussian noise
Experiments
• model on the training set from Kinetics (use dataset after removeing the label)
– Kinetics dataset is diverse collection of 300, 000 videos from YouTube
– Evaluate the model on the standard testing sets of other datasets depending on the task
– Compare against the following unsupervised baselines
• Optical Flow : After extracting the feature points that seem important in the previous frame (which can also be extracted in the next
frame), visualize how much the same feature points are found in the current frame
• Single Image Colorization : Evaluated how well computing similarity from the embeddings of a single image colorization model
work instead of our embeddings
http://hs36.tistory.com/47 참고
Experiments
• The picture on the left is an example of the video selection result given by the model reference frame (Use Kinetics validation set)
– This model learns to copy colors over many challenging transformations
– For example, butter spreading or people dancing
– Model adaptable to various difficult tracking situations
Experiments
• Video segmentation average performance versus time in the video
• More consistent performance for longer time periods than optical flow
– Optical flow on average degrades to the identity baseline. Since videos are variable length
• The average performance broken down by attributes that describe
the type of motion in the video
• Sort the attributes by relative gain over optical flow
Experiments
• Human Pose Tracking
• Track human poses given key-points in an initial frame
– JHMDB academic dataset
• At a strict threshold, this model tracks key-points with a similar performance as optical flow
Examples of using the model to track movements of the human skeleton. From ai.googleblog
Conclusion
• The task of video colorization is a promising signal for learning to track without requiring human supervision
• Learning to colorize video by pointing to a colorful reference frame causes a visual tracker to automatically emerge, which we
leverage for video segmentation and human pose tracking
• Improving the video colorization task may translate into improvements in self-supervised tracking

More Related Content

What's hot

OpenCV presentation series- part 4
OpenCV presentation series- part 4OpenCV presentation series- part 4
OpenCV presentation series- part 4
Sairam Adithya
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
Seunghyun Hwang
 
OpenCV presentation series- part 5
OpenCV presentation series- part 5OpenCV presentation series- part 5
OpenCV presentation series- part 5
Sairam Adithya
 
Coin recognition using matlab
Coin recognition using matlabCoin recognition using matlab
Coin recognition using matlab
slmnsvn
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
Cnn
CnnCnn
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
ananth
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
MarcinJedyk
 
study Image Vectorization using Optimized Gradeint Meshes
study Image Vectorization using Optimized Gradeint Meshesstudy Image Vectorization using Optimized Gradeint Meshes
study Image Vectorization using Optimized Gradeint Meshes
Chiamin Hsu
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
Marcin Jedyk
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Sungchul Kim
 
DeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary RefinementDeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary Refinement
Seunghyun Hwang
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Sangwoo Mo
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Sangmin Woo
 
Digital image processing lab 1
Digital image processing lab 1Digital image processing lab 1
Digital image processing lab 1
Moe Moe Myint
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
ananth
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
Varun Bhaseen
 

What's hot (20)

OpenCV presentation series- part 4
OpenCV presentation series- part 4OpenCV presentation series- part 4
OpenCV presentation series- part 4
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 
OpenCV presentation series- part 5
OpenCV presentation series- part 5OpenCV presentation series- part 5
OpenCV presentation series- part 5
 
Coin recognition using matlab
Coin recognition using matlabCoin recognition using matlab
Coin recognition using matlab
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Cnn
CnnCnn
Cnn
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
study Image Vectorization using Optimized Gradeint Meshes
study Image Vectorization using Optimized Gradeint Meshesstudy Image Vectorization using Optimized Gradeint Meshes
study Image Vectorization using Optimized Gradeint Meshes
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
DeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary RefinementDeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary Refinement
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Digital image processing lab 1
Digital image processing lab 1Digital image processing lab 1
Digital image processing lab 1
 
Anti aliasing
Anti aliasingAnti aliasing
Anti aliasing
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
 

Similar to Tracking emerges by colorizing videos

Minor Project Report on Denoising Diffusion Probabilistic Model
Minor Project Report on Denoising Diffusion Probabilistic ModelMinor Project Report on Denoising Diffusion Probabilistic Model
Minor Project Report on Denoising Diffusion Probabilistic Model
soxigoh238
 
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
JacobSilbiger1
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Sujit Pal
 
Deep learning with keras
Deep learning with kerasDeep learning with keras
Deep learning with keras
MOHITKUMAR1379
 
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by MadhuAN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
Madhu Rock
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident network
NAVER Engineering
 
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTIONANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
RajatRoy60
 
ppt.pdf
ppt.pdfppt.pdf
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Dongmin Choi
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
Moazzem Hossain
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
LEE HOSEONG
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
SaadMemon23
 
Flag segmentation, feature extraction & identification using support vector m...
Flag segmentation, feature extraction & identification using support vector m...Flag segmentation, feature extraction & identification using support vector m...
Flag segmentation, feature extraction & identification using support vector m...
R M Shahidul Islam Shahed
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
NUPUR YADAV
 
CBIR_white.ppt
CBIR_white.pptCBIR_white.ppt
CBIR_white.ppt
Dr. Naushad Varish
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
Aditya Bhattacharya
 
Automated_attendance_system_project.pptx
Automated_attendance_system_project.pptxAutomated_attendance_system_project.pptx
Automated_attendance_system_project.pptx
Naveensai51
 
Neural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep LearningNeural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep Learning
comifa7406
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
Yu Huang
 
Blind Source Camera Identification
Blind Source Camera Identification Blind Source Camera Identification
Blind Source Camera Identification
Sudhanshu Patel
 

Similar to Tracking emerges by colorizing videos (20)

Minor Project Report on Denoising Diffusion Probabilistic Model
Minor Project Report on Denoising Diffusion Probabilistic ModelMinor Project Report on Denoising Diffusion Probabilistic Model
Minor Project Report on Denoising Diffusion Probabilistic Model
 
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
 
Deep learning with keras
Deep learning with kerasDeep learning with keras
Deep learning with keras
 
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by MadhuAN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident network
 
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTIONANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
 
ppt.pdf
ppt.pdfppt.pdf
ppt.pdf
 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Flag segmentation, feature extraction & identification using support vector m...
Flag segmentation, feature extraction & identification using support vector m...Flag segmentation, feature extraction & identification using support vector m...
Flag segmentation, feature extraction & identification using support vector m...
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
CBIR_white.ppt
CBIR_white.pptCBIR_white.ppt
CBIR_white.ppt
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Automated_attendance_system_project.pptx
Automated_attendance_system_project.pptxAutomated_attendance_system_project.pptx
Automated_attendance_system_project.pptx
 
Neural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep LearningNeural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep Learning
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
Blind Source Camera Identification
Blind Source Camera Identification Blind Source Camera Identification
Blind Source Camera Identification
 

Recently uploaded

Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
itech2017
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 

Recently uploaded (20)

Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 

Tracking emerges by colorizing videos

  • 1. Tracking Emerges by Colorizing Videos Vondrick, C., Shrivastava, A., Fathi, A., Guadarrama, S., & Murphy, K. (2018). arXiv preprint arXiv:1806.09594. 발표자 : 오유진
  • 2. Key Point ✓ Visual tracking of objects naturally by converting black-and-white images into color [Abstract] • Teaching a machine to visually track objects is challenging – It requires large, labeled tracking datasets for training, which are impractical to annotate at scale – Hard to prepare image datasets • Suggest how to color the grayscale video by copying colors from a reference frame • Network automatically tracks objects without supervision academic dataset DAVIS 2017
  • 3. Related Work • Self-supervised Learning – Training visual models without human supervision – Training labels are decided by input data – Typically, network uses a piece of data and predict the rest • Tracking without label – Self-supervised learning problem that causes the model to automatically learn tracking on its own – Using the same trained model to tracking and colorizing without fine-tuning or re-training • Colorization – Colorizing gray-scale images has been the subject of significant study in the computer vision community – Use video colorization as a proxy task for learning to track
  • 4. Self-supervised tracking; Model ✓ Convert all frames except the first frame to grays-scale and learn the convolutional network to predict the original color • When a Gray-scale frame is given, this model calculates low-dimensional embedding for each location • Points from the target frame into the reference frame embeddings(solid yellow arrow) • Copies the color back into the predicted frame (dashed yellow arrow) • After learning, use the pointing mechanism as a visual tracker
  • 5. Self-supervised tracking; Model • 𝑐𝑖 ∈ ℝ 𝑑 is the true color for pixel 𝑖 in the reference frame • 𝑐𝑖 ∈ ℝ 𝑑 is the true color for pixel 𝑗 in the target frame • 𝑦𝑗 ∈ ℝ 𝑑 is model’s prediction for 𝑐𝑖 • Predicts 𝑦𝑗 as a linear combination of colors in the reference frame → 𝑦𝑗 = σ𝑖 𝐴𝑖𝑗 𝑐𝑖 • A is a similarity matrix between target frame and reference frame 𝐴𝑖𝑗 = exp(𝑓𝑖 𝑇 𝑓𝑗) σ 𝑘 exp(𝑓𝑘 𝑇 𝑓𝑗) • 𝑓𝑖 ∈ ℝ 𝐷 is a low-dimension embedding for pixel 𝑖 that is estimated by a CNN • If there are two objects with the same color, the model does not constrain them to have the same embedding video from the DAVIS 2017 dataset
  • 6. Self-supervised tracking; Learning • The assumption during training that color is generally temporally stable • Visualize frames one second apart from the Kinetics training set – The first row shows the original frames – The second row shows the ab color channels from Lab space – The third row quantizes the color space into discrete bins and perturbs the colors to make the effect more pronounced → Using k-means to clustering color channel • loss function : min 𝜃 σ 𝑗 ℒ(𝑦𝑗, 𝑐𝑗) – Train the parameters of the model θ such that the predicted colors 𝑦𝑗 are close to the target colors 𝑐𝑗 across the training set
  • 7. Self-supervised tracking; Learning • Learning to copy colors from the single reference frame requires the model to learn to internally point to the right region in order to copy the right colors • learn an explicit mechanism that we can use for tracking InputReference Frame Predicted Colors Examples of predicted colors from colorized reference frame applied to input video using the publicly-available Kinetics dataset
  • 8. Implementation Details • Use a 3D convolutional network to produce 64-dimensional embeddings • The network predicts a down-sampled feature map of 32 × 32 for each of the input frames – On each input frame uses ResNet-18 network architecture, Use five 3D convolutional network layer – To give the features global spatial information, we encode the spatial location as a two-dimensional vector in the range [−1, 1] and concatenate this to the features between the ResNet-18 and the 3D convolutional network • Model input : 256 × 256 down-sampled four gray-scale video frame • First three frame are used as reference frame fourth frame is used as target frame • 400, 000 iterations, 32 batch size, Adam optimizer – learning rate of 0.001 for the first 60, 000 iterations and reduce it to 0.0001 afterwards – The model is randomly initialized with Gaussian noise
  • 9. Experiments • model on the training set from Kinetics (use dataset after removeing the label) – Kinetics dataset is diverse collection of 300, 000 videos from YouTube – Evaluate the model on the standard testing sets of other datasets depending on the task – Compare against the following unsupervised baselines • Optical Flow : After extracting the feature points that seem important in the previous frame (which can also be extracted in the next frame), visualize how much the same feature points are found in the current frame • Single Image Colorization : Evaluated how well computing similarity from the embeddings of a single image colorization model work instead of our embeddings http://hs36.tistory.com/47 참고
  • 10. Experiments • The picture on the left is an example of the video selection result given by the model reference frame (Use Kinetics validation set) – This model learns to copy colors over many challenging transformations – For example, butter spreading or people dancing – Model adaptable to various difficult tracking situations
  • 11. Experiments • Video segmentation average performance versus time in the video • More consistent performance for longer time periods than optical flow – Optical flow on average degrades to the identity baseline. Since videos are variable length • The average performance broken down by attributes that describe the type of motion in the video • Sort the attributes by relative gain over optical flow
  • 12. Experiments • Human Pose Tracking • Track human poses given key-points in an initial frame – JHMDB academic dataset • At a strict threshold, this model tracks key-points with a similar performance as optical flow Examples of using the model to track movements of the human skeleton. From ai.googleblog
  • 13. Conclusion • The task of video colorization is a promising signal for learning to track without requiring human supervision • Learning to colorize video by pointing to a colorful reference frame causes a visual tracker to automatically emerge, which we leverage for video segmentation and human pose tracking • Improving the video colorization task may translate into improvements in self-supervised tracking