SlideShare a Scribd company logo
1 of 55
Download to read offline
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 1
Lecture 6: Structured Prediction with ConvNets
Deep Learning @ UvA
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 2
o What do convolutions look like?
o Build on the visual intuition behind Convnets
o Deep Learning Feature maps
o Transfer Learning
Previous Lecture
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 3
o What is structured prediction?
o Can we repurpose for structured prediction?
o Structured losses on ConvNets
o Multi-task learning with ConvNets
Lecture Overview
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 4
What is
structured prediction?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 5
o N-way classification
Standard inference
Dog? Cat? Bike? Car? Plane?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 6
o N-way classification
o Regression
Standard inference
How popular will this movie be in IMDB?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 7
o N-way classification
o Regression
o Ranking
o …
Standard inference
Who is older?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 8
o Do all our machine learning tasks boil to “single value” predictions?
o Are there tasks where outputs are somehow correlated?
o Is there some structure in this output correlations?
o How can we predict such structures?  Structured prediction
What do they have in common?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 9
o They all make “single value” predictions
o Do all our machine learning tasks boil down to “single value” predictions?
What do they have in common?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 10
o Predict a box around an object
o Images
◦ Spatial location  b(ounding) box
o Videos
◦ Spatio-temporal location  bbox@t, bbox@t+1, …
Object detection
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 11
Object Segmentation
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 12
Optical flow & motion estimation
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 13
Depth estimation
Godard et al., Unsupervised Monocular Depth Estimation with Left-Right Consistency, 2016
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 14
Normals and reflectance estimation
Wang et al., Designing deep networks for surface normal estimation, 2015
Rematas et al., Deep Reflectance Maps, 2016
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 15
Sentence parsing
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 16
Machine translation
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 17
o Speech synthesis
o Captioning
o Robot control
o Pose estimation
o …
And many more
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 18
o Prediction goes beyond asking for “single values”
o Outputs are complex and output dimensions correlated
What is common?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 19
o Prediction goes beyond asking for “single values”
o Outputs are complex and output dimensions correlated
o Output dimensions have latent structure
o Can we make deep networks to return structured predictions?
Structured prediction
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 20
o Prediction goes beyond asking for “single values”
o Outputs are complex and output dimensions correlated
o Output dimensions have latent structure
o Can we make deep networks to return structured predictions?
Structured prediction
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 21
ConvNets for
structured prediction
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 22
o SPPnet [He2014]
o Fast R-CNN [Girshick2015]
Sliding window on feature maps
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 23
o Process the whole image up to conv5
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 24
o Process the whole image up to conv5
o Compute possible locations for objects
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 25
o Process the whole image up to conv5
o Compute possible locations for objects (some correct, most wrong)
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 26
o Process the whole image up to conv5
o Compute possible locations for objects
o Given single location  ROI pooling module extracts fixed length feature
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
Always 4x4 no
matter the size of
candidate location
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 27
o Process the whole image up to conv5
o Compute possible locations for objects
o Given single location  ROI pooling module extracts fixed length feature
o Connect to two final layers, 1 for classification, 1 for box refinement
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
Always 3x# no
matter the size of
candidate location
Car, dog or bicycle?
New box coordinates
ROI Pooling Module
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 28
o Divide feature map in 𝑇𝑥𝑇 cells
◦ The cell size will change depending on the size of the candidate location
Region-of-Interest (ROI) Pooling Module
Always 3x3 no
matter the size of
candidate location
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 29
o Normally samples in a mini-batch completely random
o Instead, organize mini-batches by ROIs
o 1 mini-batch = 𝑁 (images) ×
𝑅
𝑁
(candidate locations)
o Feature maps shared  training speed-up by a factor of
𝑅
𝑁
o Mini-batch samples might be correlated
◦ In Fast R-CNN that was not observed
Smart fine-tuning
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 30
Some results
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 31
o Reuse convolutions for different candidate boxes
◦ Compute feature maps only once
o Region-of-Interest pooling
◦ Define stride relatively  box width divided by predefined number of “poolings” T
◦ Fixed length vector
o End-to-end training!
o (Very) Accurate object detection
o (Very) Faster
◦ Less than a second per image
o External box proposals needed
Fast-RCNN
T=5
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 32
o Fast R-CNN  external candidate locations
o Faster R-CNN  deep network proposes candidate locations
o Slide the feature map  𝑘 anchor boxes per slide
Faster R-CNN [Girshick2016]
Region Proposal Network
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 33
o Image larger than network input  slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 34
o Image larger than network input  slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 35
o Image larger than network input  slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 36
o Image larger than network input  slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 37
o Image larger than network input  slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 38
Fully Convolutional Networks [LongCVPR2014]
o Connect intermediate layers to output
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 39
o Output is too coarse
◦ Image Size 500x500, Alexnet Input Size: 227x227  Output: 10x10
o How to obtain dense predictions?
o Upconvolution
◦ Other names: deconvolution, transposed convolution, fractionally-strided
convolutions
Fully Convolutional Networks [LongCVPR2014]
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 40
Deconvolutional modules
Convolution
No padding, no strides
Image
Output
https://github.com/vdumoulin/conv_arithmetic
More visualizations:
Upconvolution
No padding, no strides
Upconvolution
Padding, strides
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 41
Coarse  Fine Output
Upconvolution
2x
Upconvolution
2x
7x7 14x14 224x224
Pixel label probabilities
Ground truth pixel labels
0.8 0.1 0.9
1 0 0
Small loss generated
Large loss generated (probability much
higher than ground truth)
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 42
Structured losses
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 43
o Segmentation map is good but not pixel-precise
◦ Details around boundaries are lost
o Cast fully convolutional outputs as unary potentials
o Consider pairwise potentials between output dimensions
Deep ConvNets with CRF loss [Chen, Papandreou 2016]
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 44
Deep ConvNets with CRF loss [Chen, Papandreou 2016]
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 45
o Segmentation map is good but not pixel-precise
◦ Details around boundaries are lost
o Cast fully convolutional outputs as unary potentials
o Consider pairwise potentials between output dimensions
o Include Fully Connected CRF loss to refine segmentation
𝐸 𝑥 = ∑𝜃𝑖 𝑥𝑖 + ∑𝜃𝑖𝑗(𝑥𝑖, 𝑥𝑗)
Deep ConvNets with CRF loss [Chen, Papandreou 2016]
Unary loss Pairwise loss
Total loss
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗 ~ 𝑤1 exp −𝛼 𝑝𝑖 − 𝑝𝑗
2
− 𝛽 𝐼𝑖 − I𝑗
2
+ 𝑤2 exp(−𝛾 𝑝𝑖 − 𝑝𝑗
2
)
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 46
Deep ConvNets with CRF loss: Examples
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 47
o Per image we can predict, boundaries,
segmentation, detection, …
◦ Why separately?
o Solve multiple tasks simultaneously
o One task might help learn another better
o One task might have more annotations
o In real applications we don’t want 7 VGGnets
◦ 1 for boundaries, 1 for normals, 1 for saliency, …
One image  Several tasks
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 48
o The total loss is the summation of the per task losses
o The per task loss relies on the common weights (VGGnet) and the weights
specialized for the task
ℒ𝑡𝑜𝑡𝑎𝑙 = ෍
𝑡𝑎𝑠𝑘
ℒ𝑡𝑎𝑠𝑘(𝜃𝑐𝑜𝑚𝑚𝑜𝑛, 𝜃𝑡𝑎𝑠𝑘) + ℛ(𝜃𝑡𝑎𝑠𝑘)
o One training image might contain specific only annotations
◦ Only a particular task is “run” for that image
o Gradients per image are computed for tasks available for the image only
Multi-task learning
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 49
Ubernet [Kokkinos2016]
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 50
Ubernet: Backpropagation
Naïve backpropagation Ubernet backpropagation
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 51
o So far, what have you noticed?
Question
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 52
o So far, what have you noticed?
o Most works are done in 2016
◦ Very fast, very active, very volatile area of research that attracts lots of interest
Question
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 53
Summary
o What is structured prediction?
o Can we repurpose for structured prediction?
o Structured losses on ConvNets
o Multi-task learning with ConvNets
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 54
o http://www.deeplearningbook.org/
◦ Part III: Chapter 16
[Kokkinos2016] Kokkinos, UberNet: Training a `Universal' Convolutional Neural Network for Low-, Mid-, and High-Level Vision using
Diverse Datasets and Limited Memory, arXiv, 2016
[Rematas2016] Rematas, Ritschel, Fritz, Gavves, Tuytelaars. Deep Reflectance Maps, CVPR, 2016
[Ren2016] Ren, He, Girshick, Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2015
[Girshick2015] Girshick. Fast R-CNN, ICCV, 2015
[Wang2015] Wang, Fouhey, Gupta. Designing Deep Networks for Surface Normal Estimation, arXiv, 2015
[Chen2014] Chen, Papandreou, Kokkinos, Murphy, Yuille. Semantic Image Segmentation with Deep Convolutional Nets and Fully
Connected CRFs, arXiv, 2014
[He2014] He, Zhang, Ren, Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, ECCV, 2014
Reading material & references
UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 55
Next lecture
o Deep Learning and Natural Language
o Invited lecture given by Prof. Christof Monz

More Related Content

Similar to lecture6.pdf

Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Yan Xu
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIYu Huang
 
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...Harshal Chaudhari
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...cvpaper. challenge
 
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Intel® Software
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Ilsvrc2015 deep residual_learning_kaiminghe
Ilsvrc2015 deep residual_learning_kaimingheIlsvrc2015 deep residual_learning_kaiminghe
Ilsvrc2015 deep residual_learning_kaiminghepramod naik
 
6 large-scale-learning.pptx
6 large-scale-learning.pptx6 large-scale-learning.pptx
6 large-scale-learning.pptxmustafa sarac
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...NAVER D2
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationTaeoh Kim
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthManuel Nieves Sáez
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomesGael Varoquaux
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAJin-Hwa Kim
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalTaeKang Woo
 
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...Edge AI and Vision Alliance
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingTakuma Wakamori
 

Similar to lecture6.pdf (20)

Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
The impact of visual saliency prediction in image classification
The impact of visual saliency prediction in image classificationThe impact of visual saliency prediction in image classification
The impact of visual saliency prediction in image classification
 
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
 
Isvc08
Isvc08Isvc08
Isvc08
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Ilsvrc2015 deep residual_learning_kaiminghe
Ilsvrc2015 deep residual_learning_kaimingheIlsvrc2015 deep residual_learning_kaiminghe
Ilsvrc2015 deep residual_learning_kaiminghe
 
6 large-scale-learning.pptx
6 large-scale-learning.pptx6 large-scale-learning.pptx
6 large-scale-learning.pptx
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in Depth
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normal
 
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data Warehousing
 

More from Tigabu Yaya

ML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdfML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdfTigabu Yaya
 
03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdfTigabu Yaya
 
MOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfMOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfTigabu Yaya
 
MOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfMOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfTigabu Yaya
 
GER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfGER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfTigabu Yaya
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdfTigabu Yaya
 
6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdfTigabu Yaya
 
200402_RoseRealTime.ppt
200402_RoseRealTime.ppt200402_RoseRealTime.ppt
200402_RoseRealTime.pptTigabu Yaya
 
matrixfactorization.ppt
matrixfactorization.pptmatrixfactorization.ppt
matrixfactorization.pptTigabu Yaya
 
The Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfThe Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfTigabu Yaya
 
C_and_C++_notes.pdf
C_and_C++_notes.pdfC_and_C++_notes.pdf
C_and_C++_notes.pdfTigabu Yaya
 
nabil-201-chap-03.ppt
nabil-201-chap-03.pptnabil-201-chap-03.ppt
nabil-201-chap-03.pptTigabu Yaya
 
Mat 223_Ch3-Determinants.ppt
Mat 223_Ch3-Determinants.pptMat 223_Ch3-Determinants.ppt
Mat 223_Ch3-Determinants.pptTigabu Yaya
 

More from Tigabu Yaya (20)

ML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdfML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdf
 
03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf
 
MOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfMOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdf
 
MOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfMOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdf
 
GER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfGER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdf
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
lecture3.pdf
lecture3.pdflecture3.pdf
lecture3.pdf
 
lecture2.pdf
lecture2.pdflecture2.pdf
lecture2.pdf
 
Chap 4.ppt
Chap 4.pptChap 4.ppt
Chap 4.ppt
 
200402_RoseRealTime.ppt
200402_RoseRealTime.ppt200402_RoseRealTime.ppt
200402_RoseRealTime.ppt
 
matrixfactorization.ppt
matrixfactorization.pptmatrixfactorization.ppt
matrixfactorization.ppt
 
nnfl.0620.pptx
nnfl.0620.pptxnnfl.0620.pptx
nnfl.0620.pptx
 
L20.ppt
L20.pptL20.ppt
L20.ppt
 
The Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfThe Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdf
 
C_and_C++_notes.pdf
C_and_C++_notes.pdfC_and_C++_notes.pdf
C_and_C++_notes.pdf
 
nabil-201-chap-03.ppt
nabil-201-chap-03.pptnabil-201-chap-03.ppt
nabil-201-chap-03.ppt
 
Mat 223_Ch3-Determinants.ppt
Mat 223_Ch3-Determinants.pptMat 223_Ch3-Determinants.ppt
Mat 223_Ch3-Determinants.ppt
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 

lecture6.pdf

  • 1. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 1 Lecture 6: Structured Prediction with ConvNets Deep Learning @ UvA
  • 2. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 2 o What do convolutions look like? o Build on the visual intuition behind Convnets o Deep Learning Feature maps o Transfer Learning Previous Lecture
  • 3. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 3 o What is structured prediction? o Can we repurpose for structured prediction? o Structured losses on ConvNets o Multi-task learning with ConvNets Lecture Overview
  • 4. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS - 4 What is structured prediction?
  • 5. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 5 o N-way classification Standard inference Dog? Cat? Bike? Car? Plane?
  • 6. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 6 o N-way classification o Regression Standard inference How popular will this movie be in IMDB?
  • 7. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 7 o N-way classification o Regression o Ranking o … Standard inference Who is older?
  • 8. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 8 o Do all our machine learning tasks boil to “single value” predictions? o Are there tasks where outputs are somehow correlated? o Is there some structure in this output correlations? o How can we predict such structures?  Structured prediction What do they have in common?
  • 9. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 9 o They all make “single value” predictions o Do all our machine learning tasks boil down to “single value” predictions? What do they have in common?
  • 10. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 10 o Predict a box around an object o Images ◦ Spatial location  b(ounding) box o Videos ◦ Spatio-temporal location  bbox@t, bbox@t+1, … Object detection
  • 11. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 11 Object Segmentation
  • 12. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 12 Optical flow & motion estimation
  • 13. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 13 Depth estimation Godard et al., Unsupervised Monocular Depth Estimation with Left-Right Consistency, 2016
  • 14. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 14 Normals and reflectance estimation Wang et al., Designing deep networks for surface normal estimation, 2015 Rematas et al., Deep Reflectance Maps, 2016
  • 15. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 15 Sentence parsing
  • 16. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 16 Machine translation
  • 17. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 17 o Speech synthesis o Captioning o Robot control o Pose estimation o … And many more
  • 18. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 18 o Prediction goes beyond asking for “single values” o Outputs are complex and output dimensions correlated What is common?
  • 19. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 19 o Prediction goes beyond asking for “single values” o Outputs are complex and output dimensions correlated o Output dimensions have latent structure o Can we make deep networks to return structured predictions? Structured prediction
  • 20. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 20 o Prediction goes beyond asking for “single values” o Outputs are complex and output dimensions correlated o Output dimensions have latent structure o Can we make deep networks to return structured predictions? Structured prediction
  • 21. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS - 21 ConvNets for structured prediction
  • 22. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 22 o SPPnet [He2014] o Fast R-CNN [Girshick2015] Sliding window on feature maps
  • 23. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 23 o Process the whole image up to conv5 Fast R-CNN: Steps Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 Conv 5 feature map
  • 24. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 24 o Process the whole image up to conv5 o Compute possible locations for objects Fast R-CNN: Steps Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 Conv 5 feature map
  • 25. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 25 o Process the whole image up to conv5 o Compute possible locations for objects (some correct, most wrong) Fast R-CNN: Steps Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 Conv 5 feature map
  • 26. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 26 o Process the whole image up to conv5 o Compute possible locations for objects o Given single location  ROI pooling module extracts fixed length feature Fast R-CNN: Steps Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 Conv 5 feature map Always 4x4 no matter the size of candidate location
  • 27. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 27 o Process the whole image up to conv5 o Compute possible locations for objects o Given single location  ROI pooling module extracts fixed length feature o Connect to two final layers, 1 for classification, 1 for box refinement Fast R-CNN: Steps Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 Conv 5 feature map Always 3x# no matter the size of candidate location Car, dog or bicycle? New box coordinates ROI Pooling Module
  • 28. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 28 o Divide feature map in 𝑇𝑥𝑇 cells ◦ The cell size will change depending on the size of the candidate location Region-of-Interest (ROI) Pooling Module Always 3x3 no matter the size of candidate location
  • 29. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 29 o Normally samples in a mini-batch completely random o Instead, organize mini-batches by ROIs o 1 mini-batch = 𝑁 (images) × 𝑅 𝑁 (candidate locations) o Feature maps shared  training speed-up by a factor of 𝑅 𝑁 o Mini-batch samples might be correlated ◦ In Fast R-CNN that was not observed Smart fine-tuning
  • 30. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 30 Some results
  • 31. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 31 o Reuse convolutions for different candidate boxes ◦ Compute feature maps only once o Region-of-Interest pooling ◦ Define stride relatively  box width divided by predefined number of “poolings” T ◦ Fixed length vector o End-to-end training! o (Very) Accurate object detection o (Very) Faster ◦ Less than a second per image o External box proposals needed Fast-RCNN T=5
  • 32. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 32 o Fast R-CNN  external candidate locations o Faster R-CNN  deep network proposes candidate locations o Slide the feature map  𝑘 anchor boxes per slide Faster R-CNN [Girshick2016] Region Proposal Network
  • 33. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 33 o Image larger than network input  slide the network Going Fully Convolutional [LongCVPR2014] Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 fc1 fc2 Is this pixel a camel? Yes! No!
  • 34. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 34 o Image larger than network input  slide the network Going Fully Convolutional [LongCVPR2014] Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 fc1 fc2 Is this pixel a camel? Yes! No!
  • 35. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 35 o Image larger than network input  slide the network Going Fully Convolutional [LongCVPR2014] Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 fc1 fc2 Is this pixel a camel? Yes! No!
  • 36. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 36 o Image larger than network input  slide the network Going Fully Convolutional [LongCVPR2014] Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 fc1 fc2 Is this pixel a camel? Yes! No!
  • 37. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 37 o Image larger than network input  slide the network Going Fully Convolutional [LongCVPR2014] Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 fc1 fc2 Is this pixel a camel? Yes! No!
  • 38. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 38 Fully Convolutional Networks [LongCVPR2014] o Connect intermediate layers to output
  • 39. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 39 o Output is too coarse ◦ Image Size 500x500, Alexnet Input Size: 227x227  Output: 10x10 o How to obtain dense predictions? o Upconvolution ◦ Other names: deconvolution, transposed convolution, fractionally-strided convolutions Fully Convolutional Networks [LongCVPR2014]
  • 40. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 40 Deconvolutional modules Convolution No padding, no strides Image Output https://github.com/vdumoulin/conv_arithmetic More visualizations: Upconvolution No padding, no strides Upconvolution Padding, strides
  • 41. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 41 Coarse  Fine Output Upconvolution 2x Upconvolution 2x 7x7 14x14 224x224 Pixel label probabilities Ground truth pixel labels 0.8 0.1 0.9 1 0 0 Small loss generated Large loss generated (probability much higher than ground truth)
  • 42. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS - 42 Structured losses
  • 43. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 43 o Segmentation map is good but not pixel-precise ◦ Details around boundaries are lost o Cast fully convolutional outputs as unary potentials o Consider pairwise potentials between output dimensions Deep ConvNets with CRF loss [Chen, Papandreou 2016]
  • 44. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 44 Deep ConvNets with CRF loss [Chen, Papandreou 2016]
  • 45. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 45 o Segmentation map is good but not pixel-precise ◦ Details around boundaries are lost o Cast fully convolutional outputs as unary potentials o Consider pairwise potentials between output dimensions o Include Fully Connected CRF loss to refine segmentation 𝐸 𝑥 = ∑𝜃𝑖 𝑥𝑖 + ∑𝜃𝑖𝑗(𝑥𝑖, 𝑥𝑗) Deep ConvNets with CRF loss [Chen, Papandreou 2016] Unary loss Pairwise loss Total loss 𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗 ~ 𝑤1 exp −𝛼 𝑝𝑖 − 𝑝𝑗 2 − 𝛽 𝐼𝑖 − I𝑗 2 + 𝑤2 exp(−𝛾 𝑝𝑖 − 𝑝𝑗 2 )
  • 46. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 46 Deep ConvNets with CRF loss: Examples
  • 47. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 47 o Per image we can predict, boundaries, segmentation, detection, … ◦ Why separately? o Solve multiple tasks simultaneously o One task might help learn another better o One task might have more annotations o In real applications we don’t want 7 VGGnets ◦ 1 for boundaries, 1 for normals, 1 for saliency, … One image  Several tasks
  • 48. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 48 o The total loss is the summation of the per task losses o The per task loss relies on the common weights (VGGnet) and the weights specialized for the task ℒ𝑡𝑜𝑡𝑎𝑙 = ෍ 𝑡𝑎𝑠𝑘 ℒ𝑡𝑎𝑠𝑘(𝜃𝑐𝑜𝑚𝑚𝑜𝑛, 𝜃𝑡𝑎𝑠𝑘) + ℛ(𝜃𝑡𝑎𝑠𝑘) o One training image might contain specific only annotations ◦ Only a particular task is “run” for that image o Gradients per image are computed for tasks available for the image only Multi-task learning
  • 49. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 49 Ubernet [Kokkinos2016]
  • 50. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 50 Ubernet: Backpropagation Naïve backpropagation Ubernet backpropagation
  • 51. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 51 o So far, what have you noticed? Question
  • 52. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 52 o So far, what have you noticed? o Most works are done in 2016 ◦ Very fast, very active, very volatile area of research that attracts lots of interest Question
  • 53. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS - 53 Summary o What is structured prediction? o Can we repurpose for structured prediction? o Structured losses on ConvNets o Multi-task learning with ConvNets
  • 54. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 54 o http://www.deeplearningbook.org/ ◦ Part III: Chapter 16 [Kokkinos2016] Kokkinos, UberNet: Training a `Universal' Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory, arXiv, 2016 [Rematas2016] Rematas, Ritschel, Fritz, Gavves, Tuytelaars. Deep Reflectance Maps, CVPR, 2016 [Ren2016] Ren, He, Girshick, Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2015 [Girshick2015] Girshick. Fast R-CNN, ICCV, 2015 [Wang2015] Wang, Fouhey, Gupta. Designing Deep Networks for Surface Normal Estimation, arXiv, 2015 [Chen2014] Chen, Papandreou, Kokkinos, Murphy, Yuille. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arXiv, 2014 [He2014] He, Zhang, Ren, Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, ECCV, 2014 Reading material & references
  • 55. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS - 55 Next lecture o Deep Learning and Natural Language o Invited lecture given by Prof. Christof Monz