Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
lecture6.pdf
1. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 1
Lecture 6: Structured Prediction with ConvNets
Deep Learning @ UvA
2. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 2
o What do convolutions look like?
o Build on the visual intuition behind Convnets
o Deep Learning Feature maps
o Transfer Learning
Previous Lecture
3. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 3
o What is structured prediction?
o Can we repurpose for structured prediction?
o Structured losses on ConvNets
o Multi-task learning with ConvNets
Lecture Overview
4. UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 4
What is
structured prediction?
5. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 5
o N-way classification
Standard inference
Dog? Cat? Bike? Car? Plane?
6. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 6
o N-way classification
o Regression
Standard inference
How popular will this movie be in IMDB?
7. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 7
o N-way classification
o Regression
o Ranking
o …
Standard inference
Who is older?
8. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 8
o Do all our machine learning tasks boil to “single value” predictions?
o Are there tasks where outputs are somehow correlated?
o Is there some structure in this output correlations?
o How can we predict such structures? Structured prediction
What do they have in common?
9. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 9
o They all make “single value” predictions
o Do all our machine learning tasks boil down to “single value” predictions?
What do they have in common?
10. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 10
o Predict a box around an object
o Images
◦ Spatial location b(ounding) box
o Videos
◦ Spatio-temporal location bbox@t, bbox@t+1, …
Object detection
11. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 11
Object Segmentation
12. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 12
Optical flow & motion estimation
13. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 13
Depth estimation
Godard et al., Unsupervised Monocular Depth Estimation with Left-Right Consistency, 2016
14. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 14
Normals and reflectance estimation
Wang et al., Designing deep networks for surface normal estimation, 2015
Rematas et al., Deep Reflectance Maps, 2016
15. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 15
Sentence parsing
16. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 16
Machine translation
17. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 17
o Speech synthesis
o Captioning
o Robot control
o Pose estimation
o …
And many more
18. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 18
o Prediction goes beyond asking for “single values”
o Outputs are complex and output dimensions correlated
What is common?
19. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 19
o Prediction goes beyond asking for “single values”
o Outputs are complex and output dimensions correlated
o Output dimensions have latent structure
o Can we make deep networks to return structured predictions?
Structured prediction
20. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 20
o Prediction goes beyond asking for “single values”
o Outputs are complex and output dimensions correlated
o Output dimensions have latent structure
o Can we make deep networks to return structured predictions?
Structured prediction
21. UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 21
ConvNets for
structured prediction
22. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 22
o SPPnet [He2014]
o Fast R-CNN [Girshick2015]
Sliding window on feature maps
23. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 23
o Process the whole image up to conv5
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
24. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 24
o Process the whole image up to conv5
o Compute possible locations for objects
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
25. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 25
o Process the whole image up to conv5
o Compute possible locations for objects (some correct, most wrong)
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
26. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 26
o Process the whole image up to conv5
o Compute possible locations for objects
o Given single location ROI pooling module extracts fixed length feature
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
Always 4x4 no
matter the size of
candidate location
27. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 27
o Process the whole image up to conv5
o Compute possible locations for objects
o Given single location ROI pooling module extracts fixed length feature
o Connect to two final layers, 1 for classification, 1 for box refinement
Fast R-CNN: Steps
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
Conv 5 feature map
Always 3x# no
matter the size of
candidate location
Car, dog or bicycle?
New box coordinates
ROI Pooling Module
28. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 28
o Divide feature map in 𝑇𝑥𝑇 cells
◦ The cell size will change depending on the size of the candidate location
Region-of-Interest (ROI) Pooling Module
Always 3x3 no
matter the size of
candidate location
29. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 29
o Normally samples in a mini-batch completely random
o Instead, organize mini-batches by ROIs
o 1 mini-batch = 𝑁 (images) ×
𝑅
𝑁
(candidate locations)
o Feature maps shared training speed-up by a factor of
𝑅
𝑁
o Mini-batch samples might be correlated
◦ In Fast R-CNN that was not observed
Smart fine-tuning
30. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 30
Some results
31. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 31
o Reuse convolutions for different candidate boxes
◦ Compute feature maps only once
o Region-of-Interest pooling
◦ Define stride relatively box width divided by predefined number of “poolings” T
◦ Fixed length vector
o End-to-end training!
o (Very) Accurate object detection
o (Very) Faster
◦ Less than a second per image
o External box proposals needed
Fast-RCNN
T=5
32. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 32
o Fast R-CNN external candidate locations
o Faster R-CNN deep network proposes candidate locations
o Slide the feature map 𝑘 anchor boxes per slide
Faster R-CNN [Girshick2016]
Region Proposal Network
33. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 33
o Image larger than network input slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
34. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 34
o Image larger than network input slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
35. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 35
o Image larger than network input slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
36. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 36
o Image larger than network input slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
37. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 37
o Image larger than network input slide the network
Going Fully Convolutional [LongCVPR2014]
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
fc1 fc2
Is this pixel a camel?
Yes! No!
38. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 38
Fully Convolutional Networks [LongCVPR2014]
o Connect intermediate layers to output
39. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 39
o Output is too coarse
◦ Image Size 500x500, Alexnet Input Size: 227x227 Output: 10x10
o How to obtain dense predictions?
o Upconvolution
◦ Other names: deconvolution, transposed convolution, fractionally-strided
convolutions
Fully Convolutional Networks [LongCVPR2014]
40. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 40
Deconvolutional modules
Convolution
No padding, no strides
Image
Output
https://github.com/vdumoulin/conv_arithmetic
More visualizations:
Upconvolution
No padding, no strides
Upconvolution
Padding, strides
41. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 41
Coarse Fine Output
Upconvolution
2x
Upconvolution
2x
7x7 14x14 224x224
Pixel label probabilities
Ground truth pixel labels
0.8 0.1 0.9
1 0 0
Small loss generated
Large loss generated (probability much
higher than ground truth)
42. UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 42
Structured losses
43. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 43
o Segmentation map is good but not pixel-precise
◦ Details around boundaries are lost
o Cast fully convolutional outputs as unary potentials
o Consider pairwise potentials between output dimensions
Deep ConvNets with CRF loss [Chen, Papandreou 2016]
44. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 44
Deep ConvNets with CRF loss [Chen, Papandreou 2016]
45. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 45
o Segmentation map is good but not pixel-precise
◦ Details around boundaries are lost
o Cast fully convolutional outputs as unary potentials
o Consider pairwise potentials between output dimensions
o Include Fully Connected CRF loss to refine segmentation
𝐸 𝑥 = ∑𝜃𝑖 𝑥𝑖 + ∑𝜃𝑖𝑗(𝑥𝑖, 𝑥𝑗)
Deep ConvNets with CRF loss [Chen, Papandreou 2016]
Unary loss Pairwise loss
Total loss
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗 ~ 𝑤1 exp −𝛼 𝑝𝑖 − 𝑝𝑗
2
− 𝛽 𝐼𝑖 − I𝑗
2
+ 𝑤2 exp(−𝛾 𝑝𝑖 − 𝑝𝑗
2
)
46. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 46
Deep ConvNets with CRF loss: Examples
47. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 47
o Per image we can predict, boundaries,
segmentation, detection, …
◦ Why separately?
o Solve multiple tasks simultaneously
o One task might help learn another better
o One task might have more annotations
o In real applications we don’t want 7 VGGnets
◦ 1 for boundaries, 1 for normals, 1 for saliency, …
One image Several tasks
48. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 48
o The total loss is the summation of the per task losses
o The per task loss relies on the common weights (VGGnet) and the weights
specialized for the task
ℒ𝑡𝑜𝑡𝑎𝑙 =
𝑡𝑎𝑠𝑘
ℒ𝑡𝑎𝑠𝑘(𝜃𝑐𝑜𝑚𝑚𝑜𝑛, 𝜃𝑡𝑎𝑠𝑘) + ℛ(𝜃𝑡𝑎𝑠𝑘)
o One training image might contain specific only annotations
◦ Only a particular task is “run” for that image
o Gradients per image are computed for tasks available for the image only
Multi-task learning
49. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 49
Ubernet [Kokkinos2016]
50. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 50
Ubernet: Backpropagation
Naïve backpropagation Ubernet backpropagation
51. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 51
o So far, what have you noticed?
Question
52. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 52
o So far, what have you noticed?
o Most works are done in 2016
◦ Very fast, very active, very volatile area of research that attracts lots of interest
Question
53. UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 53
Summary
o What is structured prediction?
o Can we repurpose for structured prediction?
o Structured losses on ConvNets
o Multi-task learning with ConvNets
54. UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES STRUCTURED PREDICTION WITH CONVNETS- 54
o http://www.deeplearningbook.org/
◦ Part III: Chapter 16
[Kokkinos2016] Kokkinos, UberNet: Training a `Universal' Convolutional Neural Network for Low-, Mid-, and High-Level Vision using
Diverse Datasets and Limited Memory, arXiv, 2016
[Rematas2016] Rematas, Ritschel, Fritz, Gavves, Tuytelaars. Deep Reflectance Maps, CVPR, 2016
[Ren2016] Ren, He, Girshick, Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2015
[Girshick2015] Girshick. Fast R-CNN, ICCV, 2015
[Wang2015] Wang, Fouhey, Gupta. Designing Deep Networks for Surface Normal Estimation, arXiv, 2015
[Chen2014] Chen, Papandreou, Kokkinos, Murphy, Yuille. Semantic Image Segmentation with Deep Convolutional Nets and Fully
Connected CRFs, arXiv, 2014
[He2014] He, Zhang, Ren, Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, ECCV, 2014
Reading material & references
55. UVA DEEP LEARNING COURSE
EFSTRATIOS GAVVES
STRUCTURED PREDICTION WITH CONVNETS - 55
Next lecture
o Deep Learning and Natural Language
o Invited lecture given by Prof. Christof Monz