SlideShare a Scribd company logo
1 of 108
Download to read offline
CVPR 2017 Summary
Assaf Mushinsky
Chief Scientist & Co founder
About the speaker
Assaf Mushinsky
● Co founder & Chief Scientist
● Breakthrough MSc research with Prof. Lior Wolf
● Computer Vision & Deep Learning expert
● Key technical roles at Samsung & Eyesight
Brodmann17
● Founded at 2016
● Raised 2M$
● Team: 10 people → 6 researchers (PhDs, MScs)
● Core Technology: Deep Learning for Edge Devices
Looking for brilliant
researchers
cv@brodmann17.com / amir@brodmann17.com
CVPR 2017
● Huge conference!
○ 4950 registrations
○ 783 accepted papers (out of 2620 valid submissions)
○ 215 orals
○ Also…
■ Tutorials!
■ Workshops!
● Papers available online.
○ Many papers have been published many months ago.
● Videos available in YouTube channel:
○ https://www.youtube.com/channel/UC0n76gicaarsN_Y9YShWwhw
Agenda
● What we are going to talk about today?
○ Object detection and segmentation: people, faces and pose estimation.
○ New and exciting network architectures
○ Efficient deep learning: optimization and cascades
○ Multiscale information
○ Data augmentation, generation and synthesis
○ “Swiss-knife” network.
● What we are not going to talk about today?
○ Faster R-CNN: https://arxiv.org/abs/1506.01497
○ ResNet: https://arxiv.org/abs/1512.03385
Object Detection
Speed/accuracy trade-offs for modern convolutional object
detectors (Google)
● Presents fair comparison of the leading methods for object detection in terms
of speed and accuracy.
● Single code and training framework for fair comparison.
● Same hardware: Nvidia GeForce GTX Titan X GPU card.
● Multiple hyper-parameters configurations.
● Multiple network architectures.
● Paper: https://arxiv.org/abs/1611.10012
● Code: https://github.com/tensorflow/models/tree/master/object_detection
Speed/accuracy trade-offs for modern convolutional object
detectors (Google)
Speed/accuracy trade-offs for modern convolutional object
detectors (Google)
YOLO9000: Better, Faster, Stronger - Joseph Redmon
● Fast and accurate object detection.
● They improve on their first version of YOLO
by making it better, faster and stronger.
○ Better: YOLOv1 was fast but not accurate.
This one will be more accurate.
○ Faster: New network for faster run time.
○ Stronger: Learn to detect 9000 object
classes.
● Paper: https://arxiv.org/abs/1612.08242
● Code: http://pjreddie.com/yolo9000/
● Won best paper honorable mention award.
YOLO9000: Better, Faster, Stronger - Joseph Redmon
● Batch normalization (+2%)
● High resolution classifier (+4%)
○ Train ImageNet classifier at 224x224
○ Fine-tune ImageNet classifier at 448x448
○ Train detector at 448x448
● Convolutional with anchor boxes
○ YOLOv1 used FC for prediction.
○ Remove FC and predict anchors (-0.3%)
■ But… recall increases 81%→88%
● Select anchors using k-means
● Direct location prediction (+5%)
○ Sigmoid for constrained bounding box
prediction instead of unconstrained as in RPN.
YOLO9000: Better, Faster, Stronger - Joseph Redmon
● Fine-grained features (+1%)
○ How to combine features with features from
previous layer?
○ Passthrough layer
■ Take previous 26x26x512 and stack
adjacent features into different
channels, get 13x13x2048.
■ Concatenate with original features
● Multi-scale training (+1%)
○ Every 10 batches randomly choose a new
image dimension size.
○ Forces the network to learn to predict well
across a variety of input dimensions.
● High resolution detector (+2%)
○ Use 544x544 instead of 416x416
YOLO9000: Better?, Faster, Stronger - Joseph Redmon
YOLO9000: Better?, Faster, Stronger - Joseph Redmon
YOLO9000: Better?, Faster, Stronger - Joseph Redmon
YOLO9000: Better, Faster, Stronger - Joseph Redmon
● We want detection to be accurate but we
also want it to be fast.
● Due to multiscale training, detectors can be
applied at different scales for
speed/accuracy trade off.
● Use Darknet-19 instead of VGG16.
○ Mostly 3x3 convolutions.
○ Like NIN: Use 1x1 filters to compress the
feature representation between 3x3 convs.
YOLO9000: Better, Faster, Stronger - Joseph Redmon
● How to learn detection for 9000 classes?
● During training mix images from both detection and
classification datasets.
○ For detection images, use full backprop.
○ For classification images, use only classification part
for backprop.
● Hierarchical classification
○ ImageNet labels are pulled from WordNet.
○ Simplify the problem by building a hierarchical tree
from the concepts in ImageNet.
○ Perform classification using conditional probabilities.
● This formulation also works for detection
○ Instead of assuming that every anchor contains an
object, they use objectness predictor.
YOLO9000: Better, Faster, Stronger - Joseph Redmon
Feature Pyramid Networks for Object Detection (FAIR)
● https://arxiv.org/abs/1612.03144
Feature Pyramid Networks for Object Detection (FAIR)
RON: Reverse Connection with Objectness Prior Networks for
Object Detection
● Same idea as “Feature Pyramid Networks for Object Detection”
● https://arxiv.org/abs/1707.01691
Accurate Single Stage Detector Using Recurrent Rolling
Convolution
● Same idea as “Feature Pyramid Networks for Object Detection”
● https://arxiv.org/abs/1704.05776
Object Detection Circa 2007
Source: Ross Girshick’s object detection tutorial in CVPR 2017 http://deeplearning.csail.mit.edu/instance_ross.pptx
Object Detection Today
Source: Ross Girshick’s object detection tutorial in CVPR 2017 http://deeplearning.csail.mit.edu/instance_ross.pptx
Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
● Instance segmentation with pose estimation
for people.
● Extends faster R-CNN by adding new branch
for the instance mask task.
● Pose estimation can be added by simply
adding an additional branch.
● SOTA accuracy on detection, segmentation
and pose estimation at 5 FPS on GPU.
● https://arxiv.org/abs/1703.06870
● Girshick won young researcher award.
Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
● RoiPool
○ Quantization breaks pixel-to-pixel alignment
○ Too coarse and not good for fine spatial
information required for mask.
● RoiAlign
○ Bilinearly sample the proposal region and
avoid the quantization.
○ Smoothly normalize features and predictions
into coordinate frame free of scale and
aspect ratio
Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
● Backbone architecture
○ ResNet
○ ResNeXt
○ FPN
● Mask representation
○ FC vs. Convolutional
○ Multinomial vs. Independent Masks:
softmax vs. sigmoid
○ Class-Specific vs. Class-Agnostic Masks:
almost same accuracy
● Multi-task learning
○ Mask task improves object detection
accuracy.
○ Keypoint task reduces object detection
accuracy.
Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
● Pose estimation
○ Simply add an additional branch.
○ Model a keypoint’s location as a one-hot mask, and
adopt Mask R-CNN to predict K masks.
○ Experiments are mainly to demonstrate the
generality of the Mask R-CNN framework.
○ RoiAlign improves this task’s accuracy as well.
Learning non-maximum suppression
● Object detectors are mostly trained
end-to-end, except for the NMS.
○ NMS is still fully hand-crafted, and forces a
trade-off between recall and precision.
● Training loss is not evaluation loss.
○ Training is performed without NMS
○ During evaluation, multiple detections for
same object count as false positives.
● https://arxiv.org/abs/1705.02950
Learning non-maximum suppression
● Additional blocks that:
○ Encode pairwise information.
○ For each detection, pool information from all pairings.
○ Update feature vector.
○ Repeat.
● New loss:
○ Only one positive candidate per object.
○ Instead of the current practice to take all objects with IoU>50%
Learning non-maximum suppression
Focal Loss for Dense Object Detection (FAIR)
● Two stage detectors are usually the most
accurate.
● Single stage detectors are simpler and
usually faster.
● Reshaping the cross entropy loss to weight
down well classified samples can improve
the accuracy of single stage detectors.
● This approach is shown to be better the
online hard negative mining.
● Architecture is based on FPN.
● https://arxiv.org/abs/1708.02002
Scale aware face detection
● Detection of small objects is
computationally expensive.
● But what if there are no small objects in an
image? Why should we waste computation
on scanning those scales?
● We can divide face detection into two tasks
○ Estimate the scale of faces in a given
image.
○ For each scale, resize to fixed scale and
apply detection.
● https://arxiv.org/abs/1706.09876
Pose Estimation
Realtime Multi-Person 2D Pose Estimation Using Part Affinity
Fields
● Multi-person pose estimation is difficult
○ Unknown number of people
○ Interactions between people makes the
association of parts difficult.
○ Runtime complexity tends to grow with the
number of people in the image.
● The proposed architecture is designed to
jointly learn part locations and their
association.
● Paper: https://arxiv.org/abs/1611.08050
● Code:
https://github.com/ZheC/Realtime_Multi-P
erson_Pose_Estimation
Realtime Multi-Person 2D Pose Estimation Using Part Affinity
Fields
Realtime Multi-Person 2D Pose Estimation Using Part Affinity
Fields
Realtime Multi-Person 2D Pose Estimation Using Part Affinity
Fields
● Two branches:
○ Part location confidence maps.
○ Part affinity fields.
● Multi-stage
○ Every stage get the output of previous stage as
well as the input image.
○ Output is refined over the different stages,
allowing resolution of conflicts.
● Multi-Person Parsing using PAFs
Towards Accurate Multi-person Pose Estimation in the Wild
(Google)
● Two stage cascade model:
○ Apply a Faster-RCNN person detector to
produce a bounding box around each
candidate person instance.
○ Apply a pose estimator to the image crop
extracted around each candidate person
instance in order to localize its keypoints
and re-score the corresponding proposal.
● https://arxiv.org/abs/1701.01779
● They have newer version which performs
without object detector and is very similar
to part affinity field method.
● Demo for newer version was presented at
the conference.
LCR-Net: Localization-Classification-Regression for Human
Pose
● https://www.researchgate.net/publication/315867122_LCR-Net_Localization-C
lassification-Regression_for_Human_Pose
Coarse-To-Fine Volumetric Prediction for Single-Image 3D
Human Pose
● Common approaches has drawbacks:
○ Estimating 3D pose by regression of (x,y,z)
○ 2D pose map and 3D refinement
● Solution:
○ 3D pose map estimation.
● https://arxiv.org/abs/1611.07828
ArtTrack: Articulated Multi-Person Tracking in the Wild
● How to use temporal information for
multi-person pose tracking?
○ Build spatio-temporal graph, connect all
parts in edges between different parts in
same frame and same part in different
frames.
● Paper: https://arxiv.org/abs/1612.01465
● Dataset: http://www.posetrack.net
● Code:
https://github.com/eldar/pose-tensorflow
Let’s take a break
When we get back:
Award winning architectures
Efficient neural networks
A single network that does everything
Architectures
Densely Connected Convolutional Networks
● Residual connections in ResNet allowed
networks to be substantially deeper, more
accurate, and efficient to train.
● Dense connections take this idea further by
connecting every two layers in a block using
channel wise concatenation.
● Paper: https://arxiv.org/abs/1608.06993
● Code: https://github.com/liuzhuang13/DenseNet
● Memory efficient implementation:
https://arxiv.org/abs/1707.06990
● Won best paper award
Densely Connected Convolutional Networks
● Residual connections
● Dense connections
● Transition layers
○ The dense connectivity can’t be applied when
scale changes.
○ This is why convolution and pooling layers are
added between dense blocks.
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
● Growth rate
○ Every layer produces k outputs
○ The input for the lth
layer is k×(l-1) feature maps.
○ To prevent the network from growing too wide and
to improve the parameter efficiency k has to be
limited to a small integer.
○ Experiments show k=12 is sufficient to obtain
state-of-the-art results.
● Bottleneck layers
○ Even with a small growth rate, the number of
inputs for some layers can get very large.
○ 1×1 convolution is used to reduce the number of
features to 4k.
● Compression
○ To further improve model compactness, the
transition layer transform the m feature maps of
its input to m/2.
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
● Stronger gradient flow.
● Parameter and computational efficiency.
● Diversified features due to concatenation of all
previous features.
● Maintains both high & low complexity features.
● Less prone to overfit than ResNet when large
amounts of data isn’t available. Works better
even without augmentation.
Multi-Scale Dense Convolutional Networks
for Efficient Prediction
● Multi-scale networks with multiple classifiers.
● Multiple classifiers allow for cascaded
computation.
● Paper: https://arxiv.org/abs/1703.09844
Dual Path Networks
● ResNet enables feature re-usage.
● DenseNet enables new features
exploration.
● SOTA accuracy on ImageNet.
● https://arxiv.org/abs/1707.01629
● ImageNet classification
Dual Path Networks
● Pascal detection and segmentation.
Dual Path Networks
Deep Roots: Improving CNN Efficiency with Hierarchical Filter
Groups
● Filter group:
○ In normal convolutional layers, all the filters
process all inputs features.
○ Instead, break the filters and input features into
groups.
● Hierarchical filter groups:
○ Start with large number of groups and reduce
them as the model goes deeper.
● Didn’t compare to non-hierarchical filter groups.
● Reduces:
○ Model size
○ Running time
○ Memory consumption
● Can even improve accuracy
● https://arxiv.org/abs/1605.06489
Deep Roots: Improving CNN Efficiency with Hierarchical Filter
Groups
Xception: Deep Learning With Depthwise Separable
Convolutions
● Replace inception-like networks
with simple group convolutions.
● Convolutions and depthwise
separable convolutions lie at both
extremes of a discrete spectrum.
● Inception modules being an
intermediate point in between
● Slightly outperforms Inception V3
on the ImageNet dataset
● Significantly outperforms
Inception V3 on a larger
classification dataset with 350
million images and 17,000 classes
● https://arxiv.org/abs/1610.02357
Aggregated Residual Transformations for Deep Neural
Networks (ResNeXt)
● ResNet + Inception = ResNeXt
● 2nd place ILSVRC 2016
● Paper: https://arxiv.org/abs/1611.05431
● Code: https://github.com/facebookresearch/ResNeXt
Aggregated Residual Transformations for Deep Neural
Networks (ResNeXt)
● This is actually equivalent to filter groups.
ShuffleNet: An Extremely Efficient Convolutional
Neural Network for Mobile Devices
● Minimizes the damage of filter groups by
shuffling features between groups
● https://arxiv.org/abs/1707.01083
Feedback Networks
● Iterative processing of the input
● Improves on previous iteration using
previous feature and input.
● https://arxiv.org/abs/1612.09508
Dilated Residual Networks
● Classification networks gradually reduces
the size of the activations until we are left
with a single feature vector.
● Classification is usually a proxy task used
to pretrained networks before they are
transferred to other applications.
● We lose the spatial information that might
be beneficial to tasks such as localization
or segmentation.
● https://arxiv.org/abs/1705.09914
Dilated Residual Networks
● We can remove the pooling layers and
avoid the dimension reduction.
● But! Removing the pooling layers will
reduce the network’s receptive field and
hurt accuracy.
● How can we avoid spatial information loss
and still have a large receptive field?
Dilated Residual Networks
● Dilated convolutions:
○ Sparse filter.
○ Same output as filter with stride.
○ Doesn’t skip any input data.
○ Doesn’t change the data size.
● Advantages:
○ Increase receptive field.
○ Increase spatial information.
○ Doesn’t increase the number of network
parameters.
Dilated Residual Networks
● ResNet to Dilated Residual Network (DRN)
○ Remove stride, compensate with dilation
for groups 4 and 5.
○ Don’t need to apply to 1,2 and 3 because
stride 8 is known to preserve most of the
information.
○ Original output size was 7×7, new output
size is 28×28.
○ Improves recognition of small objects.
Dilated Residual Networks
● DRN-B-26
○ Replaces early pooling with residual blocks.
○ Adds residual blocks with reduced dilation
at the end of the network.
● DRN-C-26
○ Removes residual connections from some
of the added blocks.
○ Added layers in DRN-B-26 didn’t remove
gridding artifacts due to residual
connections which propagated artifacts.
Dilated Residual Networks
● ImageNet Classification
○ DRN-A outperforms deeper ResNets with
same number of layers and parameters.
○ Each DRN-C significantly outperforms the
corresponding DRN-A, showing degridding
is beneficial.
Dilated Residual Networks
● ImageNet weakly-supervised localization
○ Lower is better.
○ DRN-C-26 outperforms DRN-A-50 despite
lower depth and classification accuracy.
○ DRN-C-26 also outperforms ResNet-101.
Dilated Residual Networks
● Semantic Segmentation
○ ResNet-101 Achieves 66.6 mean IoU
Efficient Deep Learning
Not All Pixels Are Equal: Difficulty-aware Semantic
Segmentation via Deep Layer Cascade
● Deep layer cascade method that improve the accuracy and speed of semantic segmentation.
● The model is Initially trained as multi-loss model.
● A second training stage jointly fine-tunes the model as a cascade.
● Runs ~15 FPS
● https://arxiv.org/abs/1704.01344
Mimicking Very Efficient Network for Object Detection
● Train small network to mimic the output of a
larger one.
○ The large network acts as supervision for
training the smaller network.
○ The small network is trained using L2 loss to
mimic the output of the larger one.
○ Can be expanded to two-stage mimicking for
training efficient Faster R-CNN / R-FCN.
● Experiments
○ R-FCN w/ Inception on Caltech: 7.15
○ R-FCN w/ Inception/2 on Caltech: 8.88
○ R-FCN w/ Inception/2 mimic on Caltech: 7.31
● http://openaccess.thecvf.com/content_cvpr_
2017/papers/Li_Mimicking_Very_Efficient_CV
PR_2017_paper.pdf
Spatially Adaptive Computation Time for Residual Networks
● Automatically learn which pixel to compute
residual functions for and which to simply
keep current value.
● Each layer outputs confidence which
aggregates until pass threshold, then
computation is stopped for this pixel.
● Paper: https://arxiv.org/abs/1612.02297
● Code: https://github.com/mfigurnov/sact
LCNN: Lookup-based Convolutional Neural Network (XNOR.AI)
● Create a dictionary for convolutions.
● Convolutions are weighted combination.
● https://arxiv.org/abs/1611.06473
Binarized Neural Network with Separable Filters
● They build Hubara’s work for binarized NNs.
● Breaking 3x3 filters into 1x3 and 3x1 filters.
● 30% faster, minor drop in accuracy.
● https://arxiv.org/abs/1707.04693
Data
Learning From Simulated and Unsupervised Images Through
Adversarial Training (Apple)
● Real train data is expensive. Can we use
simulated data?
○ Simulated data is cheap and we don’t need
to annotate it.
○ There is a gap between simulated and real
image.
● How can we make synthetic images look
more real?
● How can we do that without changing the
properties of the synthetic images?
● They use this method for eye gaze
estimation and hand pose estimation.
● Paper: https://arxiv.org/abs/1612.07828
● Won best paper award.
Learning From Simulated and Unsupervised Images Through
Adversarial Training (Apple)
● They train GAN to modify the synthetic
image to look more real.
○ The generator modifies the image to fool
the discriminator.
○ The discriminator tries to classify real vs.
synthetic images.
● They make small local changes due to
small receptive field resnet.
● The loss of the discriminator is local
because it ends with a loss map instead of
single loss.
● Humans got 80% on synthetic vs real
images but only 51% accuracy on refined
vs. real images.
A-Fast-RCNN: Hard Positive Generation via Adversary for
Object Detection
● Adversarial network that generates examples with occlusions and
deformations.
● https://arxiv.org/abs/1704.03414
Training Object Class Detectors With Click Supervision
● x9 faster labeling speed than fully
supervised.
● Not comparison to state of the art in terms
of accuracy.
● Two click validation helps determining the
scale of the object.
● Start with annotator verification process
using pre-labeled test set.
● https://arxiv.org/abs/1704.06189
Making Deep Neural Networks Robust to Label Noise: A Loss
Correction Approach
● Labels are expensive to obtain because
they require human labeling.
● They want to avoid the need for a set of
clean labels, or knowledge of the noise
statistics.
● During training, correct the loss function by
reweighting the loss according to
estimated noise between classes.
● https://arxiv.org/abs/1609.03683
Harvesting Multiple Views for Marker-Less 3D Human Pose
Annotations
● Use pretrained pose net to estimate
probability map for each part.
● Do this for multiple views.
● Fuse information into single pose
estimation.
● Use this pose as new ground truth for
training.
● Automatic annotations help to improve
accuracy.
One model to rule them all
Ubernet: Training a Universal Convolutional Neural Network
● Computer vision involves a host of tasks,
such as boundary detection, semantic
segmentation, surface estimation, object
detection, image classification.
● In a joint application, running a network for
each task in feasible.
● Can one network solve all of our computer
vision tasks?
○ Of course. Naively combine multiple
networks and get a single network.
○ Can we do better?
● https://arxiv.org/abs/1609.02132
Ubernet: Training a Universal Convolutional Neural Network
● How do we train multiple tasks without having single dataset for all tasks?
Ubernet: Training a Universal Convolutional Neural Network
● Architecture
○ Based on VGG16.
○ A minimal number of additional, task-specific layers.
○ Skip layers to combine the best features for every task.
○ Skip-layer connection are normalized using batch norm
○ Multi-resolution CNN
○ Atrous convolution
● Training loss
○ Adapt loss per sample.
■ Zero loss when ground truth is missing.
○ Asynchronous SGD
■ Accumulate gradients for each tasks
■ Only update weights when seen enough
samples for specific task.
Ubernet: Training a Universal Convolutional Neural Network
● Low memory back-propagation
Ubernet: Training a Universal Convolutional Neural Network
Pascal In Detail - Make Pascal Great Again!
● https://sites.google.com/view/pasd
● Measure the progress in image
understanding as reflected in a diverse set
of visual tasks.
● Single-Task Challenges
○ Image Classification, Object Detection,
Semantic Segmentation, Instance
Segmentation, Object Part Segmentation,
Objectness, Boundary Detection, Occlusion
Recognition, Human Keypoint Estimation,
Human Action Recognition,
● Multi-Task Challenges
○ Boxes to Points Triathlon: Object Detection,
Instance Segmentation, Keypoint
Estimation
○ PASCAL++ Triathlon: Image Classification,
Object Detection, Semantic Segmentation
○ Humans in Detail Triathlon: Human Parts,
Keypoints, Action
● PASCAL Decathlon:
○ All 10 tasks
Taster - Visual Domain Decathlon
● http://www.robots.ox.ac.uk/~vgg/decathlon/
● Solve ten image classification problems simultaneously.
a. Aircraft
b. CIFAR-100
c. Daimler pedestrian
d. Describable textures
e. German traffic signs
f. ImageNet
g. VGG-Flowers
h. Omniglot
i. SVHN
j. UCF101 Dynamic Images
Learning multiple visual domains with residual adapters
● Primary goal is to develop neural network architectures that can work well in a
multiple-domain setting.
● Learn adapters that can be replaced for specific tasks.
● https://arxiv.org/abs/1705.08045
Incremental Learning Through Deep Adaptation (Amir
Rosenfield)
● It is often desirable to be able to add new capabilities without hindering
performance of already learned tasks.
● Fully preserves performance on the original task, with only a small increase
(around 20%) in the number of required parameters.
○ Other methods typically double the number of parameters.
● The learned architecture can be controlled to switch between various learned
representations, enabling a single network to solve a task from multiple
different domains.
● https://arxiv.org/abs/1705.04228
● Slides: https://sites.google.com/view/amirrosenfeld
● Challenge winner!
Incremental Learning Through Deep Adaptation (Amir
Rosenfield)
Method Old Task
Perf.
New Task
Perf.
No. Params Knowledge
Reuse?
Train From Scratch Same Good High No
Fine-Tune last layer Same Suboptimal Low Yes
Fine-Tune all layers Decrease Best High Yes
Deep Adaptation
(proposed)
Same Best Low Yes
Incremental Learning Through Deep Adaptation (Amir
Rosenfield)
● Basic idea:
○ Train network N1 on task T1.
○ For task Ti, train Ni by learning how to reuse the filters of N1
○ Reuse == make new filters by linear combinations of learned ones + bias.
○ Reparametrize network dynamically based on task.
● Can control multiple task using a vector of ’s
Orig. Filters
New Filters
Modified Filters Original Filters
Switching Variable Input
Conclusions
● What did we talk about today?
○ Object detection and segmentation: people, faces and pose estimation.
○ New and exciting network architectures
○ Efficient deep learning: optimization and cascades
○ Multiscale information
○ Data augmentation, generation and synthesis
○ One network to rule them all
Conclusions
● What did we talk about today?
○ Object detection and segmentation: people, faces and pose estimation.
○ New and exciting network architectures
○ Efficient deep learning: optimization and cascades
○ Multiscale information
○ Data augmentation, generation and synthesis
○ One network to rule them all
Networks keep getting more complex
Conclusions
● What did we talk about today?
○ Object detection and segmentation: people, faces and pose estimation.
○ New and exciting network architectures
○ Efficient deep learning: optimization and cascades
○ Multiscale information
○ Data augmentation, generation and synthesis
○ One network to rule them all
State of the art keeps improving
Conclusions
● What did we talk about today?
○ Object detection and segmentation: people, faces and pose estimation.
○ New and exciting network architectures
○ Efficient deep learning: optimization and cascades
○ Multiscale information
○ Data augmentation, generation and synthesis
○ One network to rule them all
But still need to be efficient!
Looking for brilliant
researchers
cv@brodmann17.com / amir@brodmann17.com

More Related Content

What's hot

Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Universitat Politècnica de Catalunya
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportationWanjin Yu
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Deep Neural Networks Presentation
Deep Neural Networks PresentationDeep Neural Networks Presentation
Deep Neural Networks PresentationBohdan Klimenko
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIWanjin Yu
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Universitat Politècnica de Catalunya
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNNanna8885
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Neuromation.io AI Ukraine Presentation
Neuromation.io AI Ukraine PresentationNeuromation.io AI Ukraine Presentation
Neuromation.io AI Ukraine PresentationBohdan Klimenko
 

What's hot (20)

Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
 
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
Deep Neural Networks Presentation
Deep Neural Networks PresentationDeep Neural Networks Presentation
Deep Neural Networks Presentation
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 
crfasrnn_presentation
crfasrnn_presentationcrfasrnn_presentation
crfasrnn_presentation
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Neuromation.io AI Ukraine Presentation
Neuromation.io AI Ukraine PresentationNeuromation.io AI Ukraine Presentation
Neuromation.io AI Ukraine Presentation
 

Similar to Cvpr 2017 Summary Meetup

The RoboCup Rescue Dataset
The RoboCup Rescue DatasetThe RoboCup Rescue Dataset
The RoboCup Rescue DatasetPeter Lorenz
 
VIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape EstimationVIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape EstimationArithmer Inc.
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection PipelineAbhinav Dadhich
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overviewLEE HOSEONG
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong LeeMoazzem Hossain
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Databricks
 
Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019Kousuke Kuzuoka
 
DroidCon Cluj 2018 - Hands on machine learning on android
DroidCon Cluj 2018 - Hands on machine learning on androidDroidCon Cluj 2018 - Hands on machine learning on android
DroidCon Cluj 2018 - Hands on machine learning on androidMihaly Nagy
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Universitat de Barcelona
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image searchUniversitat Politècnica de Catalunya
 
cvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptxcvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptxPyariMohanJena
 
ppt 20BET1024.pptx
ppt 20BET1024.pptxppt 20BET1024.pptx
ppt 20BET1024.pptxManeetBali
 
The 'Rubble of the North' -a solution for modelling the irregular architectur...
The 'Rubble of the North' -a solution for modelling the irregular architectur...The 'Rubble of the North' -a solution for modelling the irregular architectur...
The 'Rubble of the North' -a solution for modelling the irregular architectur...3D ICONS Project
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 

Similar to Cvpr 2017 Summary Meetup (20)

The RoboCup Rescue Dataset
The RoboCup Rescue DatasetThe RoboCup Rescue Dataset
The RoboCup Rescue Dataset
 
VIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape EstimationVIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
 
Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019
 
DroidCon Cluj 2018 - Hands on machine learning on android
DroidCon Cluj 2018 - Hands on machine learning on androidDroidCon Cluj 2018 - Hands on machine learning on android
DroidCon Cluj 2018 - Hands on machine learning on android
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
 
cvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptxcvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptx
 
ppt 20BET1024.pptx
ppt 20BET1024.pptxppt 20BET1024.pptx
ppt 20BET1024.pptx
 
YOLACT
YOLACTYOLACT
YOLACT
 
The 'Rubble of the North' -a solution for modelling the irregular architectur...
The 'Rubble of the North' -a solution for modelling the irregular architectur...The 'Rubble of the North' -a solution for modelling the irregular architectur...
The 'Rubble of the North' -a solution for modelling the irregular architectur...
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
ICRA 2015 Spotlight
ICRA 2015 SpotlightICRA 2015 Spotlight
ICRA 2015 Spotlight
 

Recently uploaded

Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Recently uploaded (20)

Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Cvpr 2017 Summary Meetup

  • 1. CVPR 2017 Summary Assaf Mushinsky Chief Scientist & Co founder
  • 2. About the speaker Assaf Mushinsky ● Co founder & Chief Scientist ● Breakthrough MSc research with Prof. Lior Wolf ● Computer Vision & Deep Learning expert ● Key technical roles at Samsung & Eyesight
  • 3. Brodmann17 ● Founded at 2016 ● Raised 2M$ ● Team: 10 people → 6 researchers (PhDs, MScs) ● Core Technology: Deep Learning for Edge Devices
  • 5. CVPR 2017 ● Huge conference! ○ 4950 registrations ○ 783 accepted papers (out of 2620 valid submissions) ○ 215 orals ○ Also… ■ Tutorials! ■ Workshops! ● Papers available online. ○ Many papers have been published many months ago. ● Videos available in YouTube channel: ○ https://www.youtube.com/channel/UC0n76gicaarsN_Y9YShWwhw
  • 6.
  • 7.
  • 8. Agenda ● What we are going to talk about today? ○ Object detection and segmentation: people, faces and pose estimation. ○ New and exciting network architectures ○ Efficient deep learning: optimization and cascades ○ Multiscale information ○ Data augmentation, generation and synthesis ○ “Swiss-knife” network. ● What we are not going to talk about today? ○ Faster R-CNN: https://arxiv.org/abs/1506.01497 ○ ResNet: https://arxiv.org/abs/1512.03385
  • 10. Speed/accuracy trade-offs for modern convolutional object detectors (Google) ● Presents fair comparison of the leading methods for object detection in terms of speed and accuracy. ● Single code and training framework for fair comparison. ● Same hardware: Nvidia GeForce GTX Titan X GPU card. ● Multiple hyper-parameters configurations. ● Multiple network architectures. ● Paper: https://arxiv.org/abs/1611.10012 ● Code: https://github.com/tensorflow/models/tree/master/object_detection
  • 11. Speed/accuracy trade-offs for modern convolutional object detectors (Google)
  • 12. Speed/accuracy trade-offs for modern convolutional object detectors (Google)
  • 13. YOLO9000: Better, Faster, Stronger - Joseph Redmon ● Fast and accurate object detection. ● They improve on their first version of YOLO by making it better, faster and stronger. ○ Better: YOLOv1 was fast but not accurate. This one will be more accurate. ○ Faster: New network for faster run time. ○ Stronger: Learn to detect 9000 object classes. ● Paper: https://arxiv.org/abs/1612.08242 ● Code: http://pjreddie.com/yolo9000/ ● Won best paper honorable mention award.
  • 14. YOLO9000: Better, Faster, Stronger - Joseph Redmon ● Batch normalization (+2%) ● High resolution classifier (+4%) ○ Train ImageNet classifier at 224x224 ○ Fine-tune ImageNet classifier at 448x448 ○ Train detector at 448x448 ● Convolutional with anchor boxes ○ YOLOv1 used FC for prediction. ○ Remove FC and predict anchors (-0.3%) ■ But… recall increases 81%→88% ● Select anchors using k-means ● Direct location prediction (+5%) ○ Sigmoid for constrained bounding box prediction instead of unconstrained as in RPN.
  • 15. YOLO9000: Better, Faster, Stronger - Joseph Redmon ● Fine-grained features (+1%) ○ How to combine features with features from previous layer? ○ Passthrough layer ■ Take previous 26x26x512 and stack adjacent features into different channels, get 13x13x2048. ■ Concatenate with original features ● Multi-scale training (+1%) ○ Every 10 batches randomly choose a new image dimension size. ○ Forces the network to learn to predict well across a variety of input dimensions. ● High resolution detector (+2%) ○ Use 544x544 instead of 416x416
  • 16. YOLO9000: Better?, Faster, Stronger - Joseph Redmon
  • 17. YOLO9000: Better?, Faster, Stronger - Joseph Redmon
  • 18. YOLO9000: Better?, Faster, Stronger - Joseph Redmon
  • 19. YOLO9000: Better, Faster, Stronger - Joseph Redmon ● We want detection to be accurate but we also want it to be fast. ● Due to multiscale training, detectors can be applied at different scales for speed/accuracy trade off. ● Use Darknet-19 instead of VGG16. ○ Mostly 3x3 convolutions. ○ Like NIN: Use 1x1 filters to compress the feature representation between 3x3 convs.
  • 20. YOLO9000: Better, Faster, Stronger - Joseph Redmon ● How to learn detection for 9000 classes? ● During training mix images from both detection and classification datasets. ○ For detection images, use full backprop. ○ For classification images, use only classification part for backprop. ● Hierarchical classification ○ ImageNet labels are pulled from WordNet. ○ Simplify the problem by building a hierarchical tree from the concepts in ImageNet. ○ Perform classification using conditional probabilities. ● This formulation also works for detection ○ Instead of assuming that every anchor contains an object, they use objectness predictor.
  • 21. YOLO9000: Better, Faster, Stronger - Joseph Redmon
  • 22. Feature Pyramid Networks for Object Detection (FAIR) ● https://arxiv.org/abs/1612.03144
  • 23. Feature Pyramid Networks for Object Detection (FAIR)
  • 24. RON: Reverse Connection with Objectness Prior Networks for Object Detection ● Same idea as “Feature Pyramid Networks for Object Detection” ● https://arxiv.org/abs/1707.01691
  • 25. Accurate Single Stage Detector Using Recurrent Rolling Convolution ● Same idea as “Feature Pyramid Networks for Object Detection” ● https://arxiv.org/abs/1704.05776
  • 26. Object Detection Circa 2007 Source: Ross Girshick’s object detection tutorial in CVPR 2017 http://deeplearning.csail.mit.edu/instance_ross.pptx
  • 27. Object Detection Today Source: Ross Girshick’s object detection tutorial in CVPR 2017 http://deeplearning.csail.mit.edu/instance_ross.pptx
  • 28. Mask R-CNN - Kaiming He, Ross Girshick (FAIR) ● Instance segmentation with pose estimation for people. ● Extends faster R-CNN by adding new branch for the instance mask task. ● Pose estimation can be added by simply adding an additional branch. ● SOTA accuracy on detection, segmentation and pose estimation at 5 FPS on GPU. ● https://arxiv.org/abs/1703.06870 ● Girshick won young researcher award.
  • 29. Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
  • 30. Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
  • 31. Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
  • 32. Mask R-CNN - Kaiming He, Ross Girshick (FAIR) ● RoiPool ○ Quantization breaks pixel-to-pixel alignment ○ Too coarse and not good for fine spatial information required for mask. ● RoiAlign ○ Bilinearly sample the proposal region and avoid the quantization. ○ Smoothly normalize features and predictions into coordinate frame free of scale and aspect ratio
  • 33. Mask R-CNN - Kaiming He, Ross Girshick (FAIR)
  • 34. Mask R-CNN - Kaiming He, Ross Girshick (FAIR) ● Backbone architecture ○ ResNet ○ ResNeXt ○ FPN ● Mask representation ○ FC vs. Convolutional ○ Multinomial vs. Independent Masks: softmax vs. sigmoid ○ Class-Specific vs. Class-Agnostic Masks: almost same accuracy ● Multi-task learning ○ Mask task improves object detection accuracy. ○ Keypoint task reduces object detection accuracy.
  • 35. Mask R-CNN - Kaiming He, Ross Girshick (FAIR) ● Pose estimation ○ Simply add an additional branch. ○ Model a keypoint’s location as a one-hot mask, and adopt Mask R-CNN to predict K masks. ○ Experiments are mainly to demonstrate the generality of the Mask R-CNN framework. ○ RoiAlign improves this task’s accuracy as well.
  • 36. Learning non-maximum suppression ● Object detectors are mostly trained end-to-end, except for the NMS. ○ NMS is still fully hand-crafted, and forces a trade-off between recall and precision. ● Training loss is not evaluation loss. ○ Training is performed without NMS ○ During evaluation, multiple detections for same object count as false positives. ● https://arxiv.org/abs/1705.02950
  • 37. Learning non-maximum suppression ● Additional blocks that: ○ Encode pairwise information. ○ For each detection, pool information from all pairings. ○ Update feature vector. ○ Repeat. ● New loss: ○ Only one positive candidate per object. ○ Instead of the current practice to take all objects with IoU>50%
  • 39. Focal Loss for Dense Object Detection (FAIR) ● Two stage detectors are usually the most accurate. ● Single stage detectors are simpler and usually faster. ● Reshaping the cross entropy loss to weight down well classified samples can improve the accuracy of single stage detectors. ● This approach is shown to be better the online hard negative mining. ● Architecture is based on FPN. ● https://arxiv.org/abs/1708.02002
  • 40. Scale aware face detection ● Detection of small objects is computationally expensive. ● But what if there are no small objects in an image? Why should we waste computation on scanning those scales? ● We can divide face detection into two tasks ○ Estimate the scale of faces in a given image. ○ For each scale, resize to fixed scale and apply detection. ● https://arxiv.org/abs/1706.09876
  • 42. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields ● Multi-person pose estimation is difficult ○ Unknown number of people ○ Interactions between people makes the association of parts difficult. ○ Runtime complexity tends to grow with the number of people in the image. ● The proposed architecture is designed to jointly learn part locations and their association. ● Paper: https://arxiv.org/abs/1611.08050 ● Code: https://github.com/ZheC/Realtime_Multi-P erson_Pose_Estimation
  • 43. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
  • 44. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
  • 45. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields ● Two branches: ○ Part location confidence maps. ○ Part affinity fields. ● Multi-stage ○ Every stage get the output of previous stage as well as the input image. ○ Output is refined over the different stages, allowing resolution of conflicts. ● Multi-Person Parsing using PAFs
  • 46. Towards Accurate Multi-person Pose Estimation in the Wild (Google) ● Two stage cascade model: ○ Apply a Faster-RCNN person detector to produce a bounding box around each candidate person instance. ○ Apply a pose estimator to the image crop extracted around each candidate person instance in order to localize its keypoints and re-score the corresponding proposal. ● https://arxiv.org/abs/1701.01779 ● They have newer version which performs without object detector and is very similar to part affinity field method. ● Demo for newer version was presented at the conference.
  • 47. LCR-Net: Localization-Classification-Regression for Human Pose ● https://www.researchgate.net/publication/315867122_LCR-Net_Localization-C lassification-Regression_for_Human_Pose
  • 48. Coarse-To-Fine Volumetric Prediction for Single-Image 3D Human Pose ● Common approaches has drawbacks: ○ Estimating 3D pose by regression of (x,y,z) ○ 2D pose map and 3D refinement ● Solution: ○ 3D pose map estimation. ● https://arxiv.org/abs/1611.07828
  • 49. ArtTrack: Articulated Multi-Person Tracking in the Wild ● How to use temporal information for multi-person pose tracking? ○ Build spatio-temporal graph, connect all parts in edges between different parts in same frame and same part in different frames. ● Paper: https://arxiv.org/abs/1612.01465 ● Dataset: http://www.posetrack.net ● Code: https://github.com/eldar/pose-tensorflow
  • 50. Let’s take a break When we get back: Award winning architectures Efficient neural networks A single network that does everything
  • 52. Densely Connected Convolutional Networks ● Residual connections in ResNet allowed networks to be substantially deeper, more accurate, and efficient to train. ● Dense connections take this idea further by connecting every two layers in a block using channel wise concatenation. ● Paper: https://arxiv.org/abs/1608.06993 ● Code: https://github.com/liuzhuang13/DenseNet ● Memory efficient implementation: https://arxiv.org/abs/1707.06990 ● Won best paper award
  • 53. Densely Connected Convolutional Networks ● Residual connections ● Dense connections ● Transition layers ○ The dense connectivity can’t be applied when scale changes. ○ This is why convolution and pooling layers are added between dense blocks.
  • 57. Densely Connected Convolutional Networks ● Growth rate ○ Every layer produces k outputs ○ The input for the lth layer is k×(l-1) feature maps. ○ To prevent the network from growing too wide and to improve the parameter efficiency k has to be limited to a small integer. ○ Experiments show k=12 is sufficient to obtain state-of-the-art results. ● Bottleneck layers ○ Even with a small growth rate, the number of inputs for some layers can get very large. ○ 1×1 convolution is used to reduce the number of features to 4k. ● Compression ○ To further improve model compactness, the transition layer transform the m feature maps of its input to m/2.
  • 59. Densely Connected Convolutional Networks ● Stronger gradient flow. ● Parameter and computational efficiency. ● Diversified features due to concatenation of all previous features. ● Maintains both high & low complexity features. ● Less prone to overfit than ResNet when large amounts of data isn’t available. Works better even without augmentation.
  • 60. Multi-Scale Dense Convolutional Networks for Efficient Prediction ● Multi-scale networks with multiple classifiers. ● Multiple classifiers allow for cascaded computation. ● Paper: https://arxiv.org/abs/1703.09844
  • 61. Dual Path Networks ● ResNet enables feature re-usage. ● DenseNet enables new features exploration. ● SOTA accuracy on ImageNet. ● https://arxiv.org/abs/1707.01629
  • 63. ● Pascal detection and segmentation. Dual Path Networks
  • 64. Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups ● Filter group: ○ In normal convolutional layers, all the filters process all inputs features. ○ Instead, break the filters and input features into groups. ● Hierarchical filter groups: ○ Start with large number of groups and reduce them as the model goes deeper. ● Didn’t compare to non-hierarchical filter groups. ● Reduces: ○ Model size ○ Running time ○ Memory consumption ● Can even improve accuracy ● https://arxiv.org/abs/1605.06489
  • 65. Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups
  • 66. Xception: Deep Learning With Depthwise Separable Convolutions ● Replace inception-like networks with simple group convolutions. ● Convolutions and depthwise separable convolutions lie at both extremes of a discrete spectrum. ● Inception modules being an intermediate point in between ● Slightly outperforms Inception V3 on the ImageNet dataset ● Significantly outperforms Inception V3 on a larger classification dataset with 350 million images and 17,000 classes ● https://arxiv.org/abs/1610.02357
  • 67. Aggregated Residual Transformations for Deep Neural Networks (ResNeXt) ● ResNet + Inception = ResNeXt ● 2nd place ILSVRC 2016 ● Paper: https://arxiv.org/abs/1611.05431 ● Code: https://github.com/facebookresearch/ResNeXt
  • 68. Aggregated Residual Transformations for Deep Neural Networks (ResNeXt) ● This is actually equivalent to filter groups.
  • 69. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices ● Minimizes the damage of filter groups by shuffling features between groups ● https://arxiv.org/abs/1707.01083
  • 70. Feedback Networks ● Iterative processing of the input ● Improves on previous iteration using previous feature and input. ● https://arxiv.org/abs/1612.09508
  • 71. Dilated Residual Networks ● Classification networks gradually reduces the size of the activations until we are left with a single feature vector. ● Classification is usually a proxy task used to pretrained networks before they are transferred to other applications. ● We lose the spatial information that might be beneficial to tasks such as localization or segmentation. ● https://arxiv.org/abs/1705.09914
  • 72. Dilated Residual Networks ● We can remove the pooling layers and avoid the dimension reduction. ● But! Removing the pooling layers will reduce the network’s receptive field and hurt accuracy. ● How can we avoid spatial information loss and still have a large receptive field?
  • 73. Dilated Residual Networks ● Dilated convolutions: ○ Sparse filter. ○ Same output as filter with stride. ○ Doesn’t skip any input data. ○ Doesn’t change the data size. ● Advantages: ○ Increase receptive field. ○ Increase spatial information. ○ Doesn’t increase the number of network parameters.
  • 74. Dilated Residual Networks ● ResNet to Dilated Residual Network (DRN) ○ Remove stride, compensate with dilation for groups 4 and 5. ○ Don’t need to apply to 1,2 and 3 because stride 8 is known to preserve most of the information. ○ Original output size was 7×7, new output size is 28×28. ○ Improves recognition of small objects.
  • 75. Dilated Residual Networks ● DRN-B-26 ○ Replaces early pooling with residual blocks. ○ Adds residual blocks with reduced dilation at the end of the network. ● DRN-C-26 ○ Removes residual connections from some of the added blocks. ○ Added layers in DRN-B-26 didn’t remove gridding artifacts due to residual connections which propagated artifacts.
  • 76. Dilated Residual Networks ● ImageNet Classification ○ DRN-A outperforms deeper ResNets with same number of layers and parameters. ○ Each DRN-C significantly outperforms the corresponding DRN-A, showing degridding is beneficial.
  • 77. Dilated Residual Networks ● ImageNet weakly-supervised localization ○ Lower is better. ○ DRN-C-26 outperforms DRN-A-50 despite lower depth and classification accuracy. ○ DRN-C-26 also outperforms ResNet-101.
  • 78. Dilated Residual Networks ● Semantic Segmentation ○ ResNet-101 Achieves 66.6 mean IoU
  • 80. Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade ● Deep layer cascade method that improve the accuracy and speed of semantic segmentation. ● The model is Initially trained as multi-loss model. ● A second training stage jointly fine-tunes the model as a cascade. ● Runs ~15 FPS ● https://arxiv.org/abs/1704.01344
  • 81. Mimicking Very Efficient Network for Object Detection ● Train small network to mimic the output of a larger one. ○ The large network acts as supervision for training the smaller network. ○ The small network is trained using L2 loss to mimic the output of the larger one. ○ Can be expanded to two-stage mimicking for training efficient Faster R-CNN / R-FCN. ● Experiments ○ R-FCN w/ Inception on Caltech: 7.15 ○ R-FCN w/ Inception/2 on Caltech: 8.88 ○ R-FCN w/ Inception/2 mimic on Caltech: 7.31 ● http://openaccess.thecvf.com/content_cvpr_ 2017/papers/Li_Mimicking_Very_Efficient_CV PR_2017_paper.pdf
  • 82. Spatially Adaptive Computation Time for Residual Networks ● Automatically learn which pixel to compute residual functions for and which to simply keep current value. ● Each layer outputs confidence which aggregates until pass threshold, then computation is stopped for this pixel. ● Paper: https://arxiv.org/abs/1612.02297 ● Code: https://github.com/mfigurnov/sact
  • 83. LCNN: Lookup-based Convolutional Neural Network (XNOR.AI) ● Create a dictionary for convolutions. ● Convolutions are weighted combination. ● https://arxiv.org/abs/1611.06473
  • 84. Binarized Neural Network with Separable Filters ● They build Hubara’s work for binarized NNs. ● Breaking 3x3 filters into 1x3 and 3x1 filters. ● 30% faster, minor drop in accuracy. ● https://arxiv.org/abs/1707.04693
  • 85. Data
  • 86. Learning From Simulated and Unsupervised Images Through Adversarial Training (Apple) ● Real train data is expensive. Can we use simulated data? ○ Simulated data is cheap and we don’t need to annotate it. ○ There is a gap between simulated and real image. ● How can we make synthetic images look more real? ● How can we do that without changing the properties of the synthetic images? ● They use this method for eye gaze estimation and hand pose estimation. ● Paper: https://arxiv.org/abs/1612.07828 ● Won best paper award.
  • 87. Learning From Simulated and Unsupervised Images Through Adversarial Training (Apple) ● They train GAN to modify the synthetic image to look more real. ○ The generator modifies the image to fool the discriminator. ○ The discriminator tries to classify real vs. synthetic images. ● They make small local changes due to small receptive field resnet. ● The loss of the discriminator is local because it ends with a loss map instead of single loss. ● Humans got 80% on synthetic vs real images but only 51% accuracy on refined vs. real images.
  • 88. A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection ● Adversarial network that generates examples with occlusions and deformations. ● https://arxiv.org/abs/1704.03414
  • 89. Training Object Class Detectors With Click Supervision ● x9 faster labeling speed than fully supervised. ● Not comparison to state of the art in terms of accuracy. ● Two click validation helps determining the scale of the object. ● Start with annotator verification process using pre-labeled test set. ● https://arxiv.org/abs/1704.06189
  • 90. Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach ● Labels are expensive to obtain because they require human labeling. ● They want to avoid the need for a set of clean labels, or knowledge of the noise statistics. ● During training, correct the loss function by reweighting the loss according to estimated noise between classes. ● https://arxiv.org/abs/1609.03683
  • 91. Harvesting Multiple Views for Marker-Less 3D Human Pose Annotations ● Use pretrained pose net to estimate probability map for each part. ● Do this for multiple views. ● Fuse information into single pose estimation. ● Use this pose as new ground truth for training. ● Automatic annotations help to improve accuracy.
  • 92. One model to rule them all
  • 93. Ubernet: Training a Universal Convolutional Neural Network ● Computer vision involves a host of tasks, such as boundary detection, semantic segmentation, surface estimation, object detection, image classification. ● In a joint application, running a network for each task in feasible. ● Can one network solve all of our computer vision tasks? ○ Of course. Naively combine multiple networks and get a single network. ○ Can we do better? ● https://arxiv.org/abs/1609.02132
  • 94. Ubernet: Training a Universal Convolutional Neural Network ● How do we train multiple tasks without having single dataset for all tasks?
  • 95. Ubernet: Training a Universal Convolutional Neural Network ● Architecture ○ Based on VGG16. ○ A minimal number of additional, task-specific layers. ○ Skip layers to combine the best features for every task. ○ Skip-layer connection are normalized using batch norm ○ Multi-resolution CNN ○ Atrous convolution ● Training loss ○ Adapt loss per sample. ■ Zero loss when ground truth is missing. ○ Asynchronous SGD ■ Accumulate gradients for each tasks ■ Only update weights when seen enough samples for specific task.
  • 96. Ubernet: Training a Universal Convolutional Neural Network ● Low memory back-propagation
  • 97. Ubernet: Training a Universal Convolutional Neural Network
  • 98. Pascal In Detail - Make Pascal Great Again! ● https://sites.google.com/view/pasd ● Measure the progress in image understanding as reflected in a diverse set of visual tasks. ● Single-Task Challenges ○ Image Classification, Object Detection, Semantic Segmentation, Instance Segmentation, Object Part Segmentation, Objectness, Boundary Detection, Occlusion Recognition, Human Keypoint Estimation, Human Action Recognition, ● Multi-Task Challenges ○ Boxes to Points Triathlon: Object Detection, Instance Segmentation, Keypoint Estimation ○ PASCAL++ Triathlon: Image Classification, Object Detection, Semantic Segmentation ○ Humans in Detail Triathlon: Human Parts, Keypoints, Action ● PASCAL Decathlon: ○ All 10 tasks
  • 99. Taster - Visual Domain Decathlon ● http://www.robots.ox.ac.uk/~vgg/decathlon/ ● Solve ten image classification problems simultaneously. a. Aircraft b. CIFAR-100 c. Daimler pedestrian d. Describable textures e. German traffic signs f. ImageNet g. VGG-Flowers h. Omniglot i. SVHN j. UCF101 Dynamic Images
  • 100. Learning multiple visual domains with residual adapters ● Primary goal is to develop neural network architectures that can work well in a multiple-domain setting. ● Learn adapters that can be replaced for specific tasks. ● https://arxiv.org/abs/1705.08045
  • 101. Incremental Learning Through Deep Adaptation (Amir Rosenfield) ● It is often desirable to be able to add new capabilities without hindering performance of already learned tasks. ● Fully preserves performance on the original task, with only a small increase (around 20%) in the number of required parameters. ○ Other methods typically double the number of parameters. ● The learned architecture can be controlled to switch between various learned representations, enabling a single network to solve a task from multiple different domains. ● https://arxiv.org/abs/1705.04228 ● Slides: https://sites.google.com/view/amirrosenfeld ● Challenge winner!
  • 102. Incremental Learning Through Deep Adaptation (Amir Rosenfield) Method Old Task Perf. New Task Perf. No. Params Knowledge Reuse? Train From Scratch Same Good High No Fine-Tune last layer Same Suboptimal Low Yes Fine-Tune all layers Decrease Best High Yes Deep Adaptation (proposed) Same Best Low Yes
  • 103. Incremental Learning Through Deep Adaptation (Amir Rosenfield) ● Basic idea: ○ Train network N1 on task T1. ○ For task Ti, train Ni by learning how to reuse the filters of N1 ○ Reuse == make new filters by linear combinations of learned ones + bias. ○ Reparametrize network dynamically based on task. ● Can control multiple task using a vector of ’s Orig. Filters New Filters Modified Filters Original Filters Switching Variable Input
  • 104. Conclusions ● What did we talk about today? ○ Object detection and segmentation: people, faces and pose estimation. ○ New and exciting network architectures ○ Efficient deep learning: optimization and cascades ○ Multiscale information ○ Data augmentation, generation and synthesis ○ One network to rule them all
  • 105. Conclusions ● What did we talk about today? ○ Object detection and segmentation: people, faces and pose estimation. ○ New and exciting network architectures ○ Efficient deep learning: optimization and cascades ○ Multiscale information ○ Data augmentation, generation and synthesis ○ One network to rule them all Networks keep getting more complex
  • 106. Conclusions ● What did we talk about today? ○ Object detection and segmentation: people, faces and pose estimation. ○ New and exciting network architectures ○ Efficient deep learning: optimization and cascades ○ Multiscale information ○ Data augmentation, generation and synthesis ○ One network to rule them all State of the art keeps improving
  • 107. Conclusions ● What did we talk about today? ○ Object detection and segmentation: people, faces and pose estimation. ○ New and exciting network architectures ○ Efficient deep learning: optimization and cascades ○ Multiscale information ○ Data augmentation, generation and synthesis ○ One network to rule them all But still need to be efficient!