SlideShare a Scribd company logo
1 of 34
Download to read offline
Deep Residual Learning for
Image Recognition
Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun
Presented by โ€“ Sanjay Saha, School of Computing, NUS
CS6240 โ€“ Multimedia Analysis โ€“ Sem 2 AY2019/20
Objective | Problem Statement
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Motivation
Performance of plain networks in a deeper architecture
Image source: paper
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Main Idea
โ€ข Skip Connections/ Shortcuts
โ€ข Trying to avoid:
โ€˜Vanishing Gradientsโ€™
โ€˜Long training timesโ€™
Image source: Wikipedia
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Contributions | Problem Statement
โ€ข These extremely deep residual nets are easy to optimize, but the
counterpart โ€œplainโ€ nets (that simply stack layers) exhibit higher
training error when the depth increases.
โ€ข These deep residual nets can easily enjoy accuracy gains from greatly
increased depth, producing results substantially better than previous
networks.
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
A residual learning framework to ease the training of networks that
are substantially deeper than those used previously.
Perfor
mance
Depth
School of Computing
Literature
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Literature Review
โ€ข Partial solutions for vanishing
โ€ข Batch Normalization โ€“ To rescale the weights over some batch.
โ€ข Smart Initialization of weights โ€“ Like for example Xavier initialization.
โ€ข Train portions of the network individually.
โ€ข Highway Networks
โ€ข Feature residual connections of the form
๐‘Œ = ๐‘“ ๐‘ฅ ร— ๐‘ ๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘(๐‘Š๐‘ฅ + ๐‘) + ๐‘ฅ ร— (1 โˆ’ ๐‘ ๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘ ๐‘Š๐‘ฅ + ๐‘ )
โ€ข Data-dependent gated shortcuts with parameters
โ€ข When gates are โ€˜closedโ€™, the layers become โ€˜non-residualโ€™.
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
ResNet | Design | Architecture
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Plain Block
๐‘Ž[๐‘™] ๐‘Ž[๐‘™+2]
๐‘Ž[๐‘™+1]
๐‘ง[๐‘™+1]
= ๐‘Š[๐‘™+1]
๐‘Ž[๐‘™]
+ ๐‘[๐‘™+1]
โ€œlinearโ€
๐‘Ž[๐‘™+1] = ๐‘”(๐‘ง[๐‘™+1])
โ€œreluโ€
๐‘ง[๐‘™+2] = ๐‘Š[๐‘™+2] ๐‘Ž[๐‘™+1] + ๐‘[๐‘™+2]
โ€œoutputโ€
๐‘Ž[๐‘™+2]
= ๐‘” ๐‘ง ๐‘™+2
โ€œrelu on outputโ€
Image source: deeplearning.ai
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Residual Block
๐‘Ž[๐‘™] ๐‘Ž[๐‘™+2]
๐‘Ž[๐‘™+1]
๐‘ง[๐‘™+1]
= ๐‘Š[๐‘™+1]
๐‘Ž[๐‘™]
+ ๐‘[๐‘™+1]
โ€œlinearโ€
๐‘Ž[๐‘™+1] = ๐‘”(๐‘ง[๐‘™+1])
โ€œreluโ€
๐‘ง[๐‘™+2] = ๐‘Š[๐‘™+2] ๐‘Ž[๐‘™+1] + ๐‘[๐‘™+2]
โ€œoutputโ€
๐‘Ž[๐‘™+2]
= ๐‘” ๐‘ง ๐‘™+2
+ ๐‘Ž ๐‘™
โ€œrelu on output plus inputโ€
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: deeplearning.ai
School of Computing
Skip Connections
โ€ข Skipping immediate connections!
โ€ข Referred to as residual part of the network.
โ€ข Such residual part receives the input as an amplifier to its output โ€“
The dimensions usually are the same.
โ€ข Another option is to use a projection to the output space.
โ€ข Either way โ€“ no additional training parameters are used.
Image source: towardsdatascience.com
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
ResNet Architecture
Image source: paper
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
ResNet Architecture
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
Stacked Residual Blocks
School of Computing
ResNet Architecture
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
3x3 conv
layers
2x # of filters
2 strides to down-sample
Avg. pool after the
last conv layer
FC layer to
output classes
School of Computing
ResNet Architecture
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
3x3 conv
layers
2x # of filters
2 strides to down-sample
Avg. pool after the
last conv layer
FC layer to
output classes
School of Computing
ResNet Architecture
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
3x3 conv
layers
2x # of filters
2 strides to down-sample
Avg. pool after the
last conv layer
FC layer to
output classes
School of Computing
ResNet Architecture
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
1x1 conv with 64 filters
28x28x64
Input:
28x28x256
Image source: paper
School of Computing
ResNet Architecture
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
1x1 conv with 64 filters
28x28x64
Input:
28x28x256
3x3 conv on 64 feature
maps only
Image source: paper
School of Computing
ResNet Architecture
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
1x1 conv with 64 filters
28x28x64
Input:
28x28x256
3x3 conv on 64 feature
maps only
1x1 conv with 256 filters
28x28x256
BOTTLENECK
Image source: paper
School of Computing
Summary | Advantages
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Benefits of Bottleneck
โ€ข Less training time for deeper networks
โ€ข By keeping time complexity same as
two-layer conv.
โ€ข Hence, allows to increase # of layers.
โ€ข And, model converges faster: 152-
layer ResNet has 11.3 billion FLOPS
while VGG-16/19 nets has 15.3/19.6
billion FLOPS.
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Input:
28x28x256
Image source: paper
School of Computing
Summary โ€“ Advantages of ResNet over Plain
Networks
โ€ข A deeper plain network tends to perform bad because of the
vanishing and exploding gradients
โ€ข In such cases, ResNets will stop improving rather than decrease in
performance: ๐‘Ž[๐‘™+2] = ๐‘” ๐‘ง ๐‘™+2 + ๐‘Ž ๐‘™ = ๐‘”(๐‘ค ๐‘™+1 ๐‘Ž ๐‘™+1 + ๐‘ ๐‘™ + ๐‘Ž[๐‘™])
โ€ข If a layer is not โ€˜usefulโ€™, L2 regularization will bring its parameters very
close to zero, resulting in ๐‘Ž[๐‘™+2]
= ๐‘” ๐‘Ž[๐‘™]
= ๐‘Ž[๐‘™]
(when using ReLU)
โ€ข In theory, ResNet is still identical to plain networks, but in practice
due to the above the convergence is much faster.
โ€ข No additional training parameters and complexity introduced.
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Results
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Results
โ€ข ILSVRC 2015 classification winner (3.6% top 5 error) -- better than โ€œhuman performanceโ€!
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Error rates (%) of ensembles. The top-5 error is on the
test set of ImageNet and reported by the test server
School of Computing
Results
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Error rates (%, 10-crop testing) on ImageNet
validation set
Error rates (%) of single-model results on
the ImageNet validation set
School of Computing
Plain vs. ResNet
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
School of Computing
Plain vs. Deeper ResNet
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
School of Computing
Conclusion | Future Trends
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
โ€ขEasy to optimize deep neural networks.
โ€ขGuaranteed Accuracy gain with deeper layers.
โ€ขAddressed: Vanishing Gradient and Longer
Training duration.
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
โ€ขEasy to optimize deep neural networks.
โ€ขGuaranteed Accuracy gain with deeper layers.
โ€ขAddressed: Vanishing Gradient and Longer
Training duration.
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
โ€ขEasy to optimize deep neural networks.
โ€ขGuaranteed Accuracy gain with deeper layers.
โ€ขAddressed: Vanishing Gradient and Longer
Training duration.
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
โ€ขEasy to optimize deep neural networks.
โ€ขGuaranteed Accuracy gain with deeper layers.
โ€ขAddressed: Vanishing Gradient and Longer
Training duration.
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Future Trends
โ€ข Identity Mappings in Deep Residual Networks suggests to pass the
input directly to the final residual layer, hence allowing the network
to easily learn to pass the input as identity mapping both in forward
and backward passes. (He et. al. 2016)
โ€ข Using the Batch Normalization as pre-activation improves the
regularization
โ€ข Reduce Learning Time with Random Layer Drops
โ€ข ResNeXt: Aggregated Residual Transformations for Deep Neural
Networks. (Xie et. al. 2016)
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Questions?

More Related Content

What's hot

Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
ย 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
ย 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
ย 
AlexNet
AlexNetAlexNet
AlexNetBertil Hatt
ย 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
ย 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
ย 
Densenet CNN
Densenet CNNDensenet CNN
Densenet CNNArunKumar7374
ย 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
ย 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognitionYUNG-KUEI CHEN
ย 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
ย 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
ย 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN TutorialSungjoon Choi
ย 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
ย 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
ย 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
ย 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
ย 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
ย 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
ย 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
ย 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNNNoura Hussein
ย 

What's hot (20)

Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
ย 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
ย 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
ย 
AlexNet
AlexNetAlexNet
AlexNet
ย 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
ย 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
ย 
Densenet CNN
Densenet CNNDensenet CNN
Densenet CNN
ย 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
ย 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
ย 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
ย 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
ย 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
ย 
Linear regression
Linear regressionLinear regression
Linear regression
ย 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
ย 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
ย 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
ย 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
ย 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
ย 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
ย 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
ย 

Similar to Deep Residual Learning Image Recognition

CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
ย 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017SERC at Carleton College
ย 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
ย 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Zachary Thomas
ย 
CS772-Lec1.pptx
CS772-Lec1.pptxCS772-Lec1.pptx
CS772-Lec1.pptxadarshbarnwal5
ย 
03_Optimization (1).pptx
03_Optimization (1).pptx03_Optimization (1).pptx
03_Optimization (1).pptxKHUSHIJAIN197601
ย 
A fast fault tolerant architecture for sauvola local image thresholding algor...
A fast fault tolerant architecture for sauvola local image thresholding algor...A fast fault tolerant architecture for sauvola local image thresholding algor...
A fast fault tolerant architecture for sauvola local image thresholding algor...LeMeniz Infotech
ย 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningChetan Khatri
ย 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
ย 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politรจcnica de Catalunya
ย 
Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)Andrea Valente
ย 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
ย 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringKamleshKumar394
ย 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer VisionDavid Dao
ย 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Fwdays
ย 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxNoorUlHaq47
ย 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
ย 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsWee Hyong Tok
ย 
buragadda srikar
buragadda srikarburagadda srikar
buragadda srikarsrikar b
ย 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
ย 

Similar to Deep Residual Learning Image Recognition (20)

CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
ย 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
ย 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
ย 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
ย 
CS772-Lec1.pptx
CS772-Lec1.pptxCS772-Lec1.pptx
CS772-Lec1.pptx
ย 
03_Optimization (1).pptx
03_Optimization (1).pptx03_Optimization (1).pptx
03_Optimization (1).pptx
ย 
A fast fault tolerant architecture for sauvola local image thresholding algor...
A fast fault tolerant architecture for sauvola local image thresholding algor...A fast fault tolerant architecture for sauvola local image thresholding algor...
A fast fault tolerant architecture for sauvola local image thresholding algor...
ย 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
ย 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
ย 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
ย 
Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)
ย 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
ย 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
ย 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
ย 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"
ย 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
ย 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
ย 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
ย 
buragadda srikar
buragadda srikarburagadda srikar
buragadda srikar
ย 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
ย 

More from Sanjay Saha

Face Recognition Basic Terminologies
Face Recognition Basic TerminologiesFace Recognition Basic Terminologies
Face Recognition Basic TerminologiesSanjay Saha
ย 
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...Sanjay Saha
ย 
Convolutional Deep Belief Nets by Lee. H. 2009
Convolutional Deep Belief Nets by Lee. H. 2009Convolutional Deep Belief Nets by Lee. H. 2009
Convolutional Deep Belief Nets by Lee. H. 2009Sanjay Saha
ย 
IEEE_802.11e
IEEE_802.11eIEEE_802.11e
IEEE_802.11eSanjay Saha
ย 
Image Degradation & Resoration
Image Degradation & ResorationImage Degradation & Resoration
Image Degradation & ResorationSanjay Saha
ย 
Fault Tree Analysis
Fault Tree AnalysisFault Tree Analysis
Fault Tree AnalysisSanjay Saha
ย 
Stack and Queue (brief)
Stack and Queue (brief)Stack and Queue (brief)
Stack and Queue (brief)Sanjay Saha
ย 

More from Sanjay Saha (7)

Face Recognition Basic Terminologies
Face Recognition Basic TerminologiesFace Recognition Basic Terminologies
Face Recognition Basic Terminologies
ย 
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
ย 
Convolutional Deep Belief Nets by Lee. H. 2009
Convolutional Deep Belief Nets by Lee. H. 2009Convolutional Deep Belief Nets by Lee. H. 2009
Convolutional Deep Belief Nets by Lee. H. 2009
ย 
IEEE_802.11e
IEEE_802.11eIEEE_802.11e
IEEE_802.11e
ย 
Image Degradation & Resoration
Image Degradation & ResorationImage Degradation & Resoration
Image Degradation & Resoration
ย 
Fault Tree Analysis
Fault Tree AnalysisFault Tree Analysis
Fault Tree Analysis
ย 
Stack and Queue (brief)
Stack and Queue (brief)Stack and Queue (brief)
Stack and Queue (brief)
ย 

Recently uploaded

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
ย 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
ย 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
ย 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
ย 
็ง‘็ฝ—ๆ‹‰ๅคšๅคงๅญฆๆณขๅฐ”ๅพ—ๅˆ†ๆ กๆฏ•ไธš่ฏๅญฆไฝ่ฏๆˆ็ปฉๅ•-ๅฏๅŠž็†
็ง‘็ฝ—ๆ‹‰ๅคšๅคงๅญฆๆณขๅฐ”ๅพ—ๅˆ†ๆ กๆฏ•ไธš่ฏๅญฆไฝ่ฏๆˆ็ปฉๅ•-ๅฏๅŠž็†็ง‘็ฝ—ๆ‹‰ๅคšๅคงๅญฆๆณขๅฐ”ๅพ—ๅˆ†ๆ กๆฏ•ไธš่ฏๅญฆไฝ่ฏๆˆ็ปฉๅ•-ๅฏๅŠž็†
็ง‘็ฝ—ๆ‹‰ๅคšๅคงๅญฆๆณขๅฐ”ๅพ—ๅˆ†ๆ กๆฏ•ไธš่ฏๅญฆไฝ่ฏๆˆ็ปฉๅ•-ๅฏๅŠž็†e4aez8ss
ย 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
ย 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
ย 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
ย 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
ย 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
ย 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]๐Ÿ“Š Markus Baersch
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
ย 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
ย 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
ย 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
ย 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
ย 

Recently uploaded (20)

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
ย 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
ย 
Call Girls in Saket 99530๐Ÿ” 56974 Escort Service
Call Girls in Saket 99530๐Ÿ” 56974 Escort ServiceCall Girls in Saket 99530๐Ÿ” 56974 Escort Service
Call Girls in Saket 99530๐Ÿ” 56974 Escort Service
ย 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
ย 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
ย 
็ง‘็ฝ—ๆ‹‰ๅคšๅคงๅญฆๆณขๅฐ”ๅพ—ๅˆ†ๆ กๆฏ•ไธš่ฏๅญฆไฝ่ฏๆˆ็ปฉๅ•-ๅฏๅŠž็†
็ง‘็ฝ—ๆ‹‰ๅคšๅคงๅญฆๆณขๅฐ”ๅพ—ๅˆ†ๆ กๆฏ•ไธš่ฏๅญฆไฝ่ฏๆˆ็ปฉๅ•-ๅฏๅŠž็†็ง‘็ฝ—ๆ‹‰ๅคšๅคงๅญฆๆณขๅฐ”ๅพ—ๅˆ†ๆ กๆฏ•ไธš่ฏๅญฆไฝ่ฏๆˆ็ปฉๅ•-ๅฏๅŠž็†
็ง‘็ฝ—ๆ‹‰ๅคšๅคงๅญฆๆณขๅฐ”ๅพ—ๅˆ†ๆ กๆฏ•ไธš่ฏๅญฆไฝ่ฏๆˆ็ปฉๅ•-ๅฏๅŠž็†
ย 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
ย 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
ย 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
ย 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
ย 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
ย 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
ย 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
ย 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
ย 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
ย 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
ย 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
ย 

Deep Residual Learning Image Recognition

  • 1. Deep Residual Learning for Image Recognition Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Presented by โ€“ Sanjay Saha, School of Computing, NUS CS6240 โ€“ Multimedia Analysis โ€“ Sem 2 AY2019/20
  • 2. Objective | Problem Statement Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 3. Motivation Performance of plain networks in a deeper architecture Image source: paper Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 4. Main Idea โ€ข Skip Connections/ Shortcuts โ€ข Trying to avoid: โ€˜Vanishing Gradientsโ€™ โ€˜Long training timesโ€™ Image source: Wikipedia Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 5. Contributions | Problem Statement โ€ข These extremely deep residual nets are easy to optimize, but the counterpart โ€œplainโ€ nets (that simply stack layers) exhibit higher training error when the depth increases. โ€ข These deep residual nets can easily enjoy accuracy gains from greatly increased depth, producing results substantially better than previous networks. Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) A residual learning framework to ease the training of networks that are substantially deeper than those used previously. Perfor mance Depth School of Computing
  • 6. Literature Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 7. Literature Review โ€ข Partial solutions for vanishing โ€ข Batch Normalization โ€“ To rescale the weights over some batch. โ€ข Smart Initialization of weights โ€“ Like for example Xavier initialization. โ€ข Train portions of the network individually. โ€ข Highway Networks โ€ข Feature residual connections of the form ๐‘Œ = ๐‘“ ๐‘ฅ ร— ๐‘ ๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘(๐‘Š๐‘ฅ + ๐‘) + ๐‘ฅ ร— (1 โˆ’ ๐‘ ๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘ ๐‘Š๐‘ฅ + ๐‘ ) โ€ข Data-dependent gated shortcuts with parameters โ€ข When gates are โ€˜closedโ€™, the layers become โ€˜non-residualโ€™. Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 8. ResNet | Design | Architecture Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 9. Plain Block ๐‘Ž[๐‘™] ๐‘Ž[๐‘™+2] ๐‘Ž[๐‘™+1] ๐‘ง[๐‘™+1] = ๐‘Š[๐‘™+1] ๐‘Ž[๐‘™] + ๐‘[๐‘™+1] โ€œlinearโ€ ๐‘Ž[๐‘™+1] = ๐‘”(๐‘ง[๐‘™+1]) โ€œreluโ€ ๐‘ง[๐‘™+2] = ๐‘Š[๐‘™+2] ๐‘Ž[๐‘™+1] + ๐‘[๐‘™+2] โ€œoutputโ€ ๐‘Ž[๐‘™+2] = ๐‘” ๐‘ง ๐‘™+2 โ€œrelu on outputโ€ Image source: deeplearning.ai Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 10. Residual Block ๐‘Ž[๐‘™] ๐‘Ž[๐‘™+2] ๐‘Ž[๐‘™+1] ๐‘ง[๐‘™+1] = ๐‘Š[๐‘™+1] ๐‘Ž[๐‘™] + ๐‘[๐‘™+1] โ€œlinearโ€ ๐‘Ž[๐‘™+1] = ๐‘”(๐‘ง[๐‘™+1]) โ€œreluโ€ ๐‘ง[๐‘™+2] = ๐‘Š[๐‘™+2] ๐‘Ž[๐‘™+1] + ๐‘[๐‘™+2] โ€œoutputโ€ ๐‘Ž[๐‘™+2] = ๐‘” ๐‘ง ๐‘™+2 + ๐‘Ž ๐‘™ โ€œrelu on output plus inputโ€ Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Image source: deeplearning.ai School of Computing
  • 11. Skip Connections โ€ข Skipping immediate connections! โ€ข Referred to as residual part of the network. โ€ข Such residual part receives the input as an amplifier to its output โ€“ The dimensions usually are the same. โ€ข Another option is to use a projection to the output space. โ€ข Either way โ€“ no additional training parameters are used. Image source: towardsdatascience.com Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 12. ResNet Architecture Image source: paper Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 13. ResNet Architecture Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper Stacked Residual Blocks School of Computing
  • 14. ResNet Architecture Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper 3x3 conv layers 2x # of filters 2 strides to down-sample Avg. pool after the last conv layer FC layer to output classes School of Computing
  • 15. ResNet Architecture Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper 3x3 conv layers 2x # of filters 2 strides to down-sample Avg. pool after the last conv layer FC layer to output classes School of Computing
  • 16. ResNet Architecture Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper 3x3 conv layers 2x # of filters 2 strides to down-sample Avg. pool after the last conv layer FC layer to output classes School of Computing
  • 17. ResNet Architecture Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) 1x1 conv with 64 filters 28x28x64 Input: 28x28x256 Image source: paper School of Computing
  • 18. ResNet Architecture Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) 1x1 conv with 64 filters 28x28x64 Input: 28x28x256 3x3 conv on 64 feature maps only Image source: paper School of Computing
  • 19. ResNet Architecture Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) 1x1 conv with 64 filters 28x28x64 Input: 28x28x256 3x3 conv on 64 feature maps only 1x1 conv with 256 filters 28x28x256 BOTTLENECK Image source: paper School of Computing
  • 20. Summary | Advantages Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 21. Benefits of Bottleneck โ€ข Less training time for deeper networks โ€ข By keeping time complexity same as two-layer conv. โ€ข Hence, allows to increase # of layers. โ€ข And, model converges faster: 152- layer ResNet has 11.3 billion FLOPS while VGG-16/19 nets has 15.3/19.6 billion FLOPS. Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Input: 28x28x256 Image source: paper School of Computing
  • 22. Summary โ€“ Advantages of ResNet over Plain Networks โ€ข A deeper plain network tends to perform bad because of the vanishing and exploding gradients โ€ข In such cases, ResNets will stop improving rather than decrease in performance: ๐‘Ž[๐‘™+2] = ๐‘” ๐‘ง ๐‘™+2 + ๐‘Ž ๐‘™ = ๐‘”(๐‘ค ๐‘™+1 ๐‘Ž ๐‘™+1 + ๐‘ ๐‘™ + ๐‘Ž[๐‘™]) โ€ข If a layer is not โ€˜usefulโ€™, L2 regularization will bring its parameters very close to zero, resulting in ๐‘Ž[๐‘™+2] = ๐‘” ๐‘Ž[๐‘™] = ๐‘Ž[๐‘™] (when using ReLU) โ€ข In theory, ResNet is still identical to plain networks, but in practice due to the above the convergence is much faster. โ€ข No additional training parameters and complexity introduced. Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 23. Results Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 24. Results โ€ข ILSVRC 2015 classification winner (3.6% top 5 error) -- better than โ€œhuman performanceโ€! Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Error rates (%) of ensembles. The top-5 error is on the test set of ImageNet and reported by the test server School of Computing
  • 25. Results Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Error rates (%, 10-crop testing) on ImageNet validation set Error rates (%) of single-model results on the ImageNet validation set School of Computing
  • 26. Plain vs. ResNet Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper School of Computing
  • 27. Plain vs. Deeper ResNet Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper School of Computing
  • 28. Conclusion | Future Trends Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 29. Conclusion โ€ขEasy to optimize deep neural networks. โ€ขGuaranteed Accuracy gain with deeper layers. โ€ขAddressed: Vanishing Gradient and Longer Training duration. Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 30. Conclusion โ€ขEasy to optimize deep neural networks. โ€ขGuaranteed Accuracy gain with deeper layers. โ€ขAddressed: Vanishing Gradient and Longer Training duration. Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 31. Conclusion โ€ขEasy to optimize deep neural networks. โ€ขGuaranteed Accuracy gain with deeper layers. โ€ขAddressed: Vanishing Gradient and Longer Training duration. Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 32. Conclusion โ€ขEasy to optimize deep neural networks. โ€ขGuaranteed Accuracy gain with deeper layers. โ€ขAddressed: Vanishing Gradient and Longer Training duration. Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 33. Future Trends โ€ข Identity Mappings in Deep Residual Networks suggests to pass the input directly to the final residual layer, hence allowing the network to easily learn to pass the input as identity mapping both in forward and backward passes. (He et. al. 2016) โ€ข Using the Batch Normalization as pre-activation improves the regularization โ€ข Reduce Learning Time with Random Layer Drops โ€ข ResNeXt: Aggregated Residual Transformations for Deep Neural Networks. (Xie et. al. 2016) Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 34. Presented by โ€“ Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing Questions?