SlideShare a Scribd company logo
Deep Residual Learning for
Image Recognition
Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun
Presented by – Sanjay Saha, School of Computing, NUS
CS6240 – Multimedia Analysis – Sem 2 AY2019/20
Objective | Problem Statement
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Motivation
Performance of plain networks in a deeper architecture
Image source: paper
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Main Idea
• Skip Connections/ Shortcuts
• Trying to avoid:
‘Vanishing Gradients’
‘Long training times’
Image source: Wikipedia
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Contributions | Problem Statement
• These extremely deep residual nets are easy to optimize, but the
counterpart “plain” nets (that simply stack layers) exhibit higher
training error when the depth increases.
• These deep residual nets can easily enjoy accuracy gains from greatly
increased depth, producing results substantially better than previous
networks.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
A residual learning framework to ease the training of networks that
are substantially deeper than those used previously.
Perfor
mance
Depth
School of Computing
Literature
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Literature Review
• Partial solutions for vanishing
• Batch Normalization – To rescale the weights over some batch.
• Smart Initialization of weights – Like for example Xavier initialization.
• Train portions of the network individually.
• Highway Networks
• Feature residual connections of the form
𝑌 = 𝑓 𝑥 × 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑥 + 𝑏) + 𝑥 × (1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑊𝑥 + 𝑏 )
• Data-dependent gated shortcuts with parameters
• When gates are ‘closed’, the layers become ‘non-residual’.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
ResNet | Design | Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Plain Block
𝑎[𝑙] 𝑎[𝑙+2]
𝑎[𝑙+1]
𝑧[𝑙+1]
= 𝑊[𝑙+1]
𝑎[𝑙]
+ 𝑏[𝑙+1]
“linear”
𝑎[𝑙+1] = 𝑔(𝑧[𝑙+1])
“relu”
𝑧[𝑙+2] = 𝑊[𝑙+2] 𝑎[𝑙+1] + 𝑏[𝑙+2]
“output”
𝑎[𝑙+2]
= 𝑔 𝑧 𝑙+2
“relu on output”
Image source: deeplearning.ai
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Residual Block
𝑎[𝑙] 𝑎[𝑙+2]
𝑎[𝑙+1]
𝑧[𝑙+1]
= 𝑊[𝑙+1]
𝑎[𝑙]
+ 𝑏[𝑙+1]
“linear”
𝑎[𝑙+1] = 𝑔(𝑧[𝑙+1])
“relu”
𝑧[𝑙+2] = 𝑊[𝑙+2] 𝑎[𝑙+1] + 𝑏[𝑙+2]
“output”
𝑎[𝑙+2]
= 𝑔 𝑧 𝑙+2
+ 𝑎 𝑙
“relu on output plus input”
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: deeplearning.ai
School of Computing
Skip Connections
• Skipping immediate connections!
• Referred to as residual part of the network.
• Such residual part receives the input as an amplifier to its output –
The dimensions usually are the same.
• Another option is to use a projection to the output space.
• Either way – no additional training parameters are used.
Image source: towardsdatascience.com
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
ResNet Architecture
Image source: paper
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
Stacked Residual Blocks
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
3x3 conv
layers
2x # of filters
2 strides to down-sample
Avg. pool after the
last conv layer
FC layer to
output classes
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
3x3 conv
layers
2x # of filters
2 strides to down-sample
Avg. pool after the
last conv layer
FC layer to
output classes
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
3x3 conv
layers
2x # of filters
2 strides to down-sample
Avg. pool after the
last conv layer
FC layer to
output classes
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
1x1 conv with 64 filters
28x28x64
Input:
28x28x256
Image source: paper
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
1x1 conv with 64 filters
28x28x64
Input:
28x28x256
3x3 conv on 64 feature
maps only
Image source: paper
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
1x1 conv with 64 filters
28x28x64
Input:
28x28x256
3x3 conv on 64 feature
maps only
1x1 conv with 256 filters
28x28x256
BOTTLENECK
Image source: paper
School of Computing
Summary | Advantages
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Benefits of Bottleneck
• Less training time for deeper networks
• By keeping time complexity same as
two-layer conv.
• Hence, allows to increase # of layers.
• And, model converges faster: 152-
layer ResNet has 11.3 billion FLOPS
while VGG-16/19 nets has 15.3/19.6
billion FLOPS.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Input:
28x28x256
Image source: paper
School of Computing
Summary – Advantages of ResNet over Plain
Networks
• A deeper plain network tends to perform bad because of the
vanishing and exploding gradients
• In such cases, ResNets will stop improving rather than decrease in
performance: 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙 = 𝑔(𝑤 𝑙+1 𝑎 𝑙+1 + 𝑏 𝑙 + 𝑎[𝑙])
• If a layer is not ‘useful’, L2 regularization will bring its parameters very
close to zero, resulting in 𝑎[𝑙+2]
= 𝑔 𝑎[𝑙]
= 𝑎[𝑙]
(when using ReLU)
• In theory, ResNet is still identical to plain networks, but in practice
due to the above the convergence is much faster.
• No additional training parameters and complexity introduced.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Results
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Results
• ILSVRC 2015 classification winner (3.6% top 5 error) -- better than “human performance”!
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Error rates (%) of ensembles. The top-5 error is on the
test set of ImageNet and reported by the test server
School of Computing
Results
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Error rates (%, 10-crop testing) on ImageNet
validation set
Error rates (%) of single-model results on
the ImageNet validation set
School of Computing
Plain vs. ResNet
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
School of Computing
Plain vs. Deeper ResNet
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
School of Computing
Conclusion | Future Trends
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
•Easy to optimize deep neural networks.
•Guaranteed Accuracy gain with deeper layers.
•Addressed: Vanishing Gradient and Longer
Training duration.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
•Easy to optimize deep neural networks.
•Guaranteed Accuracy gain with deeper layers.
•Addressed: Vanishing Gradient and Longer
Training duration.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
•Easy to optimize deep neural networks.
•Guaranteed Accuracy gain with deeper layers.
•Addressed: Vanishing Gradient and Longer
Training duration.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
•Easy to optimize deep neural networks.
•Guaranteed Accuracy gain with deeper layers.
•Addressed: Vanishing Gradient and Longer
Training duration.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Future Trends
• Identity Mappings in Deep Residual Networks suggests to pass the
input directly to the final residual layer, hence allowing the network
to easily learn to pass the input as identity mapping both in forward
and backward passes. (He et. al. 2016)
• Using the Batch Normalization as pre-activation improves the
regularization
• Reduce Learning Time with Random Layer Drops
• ResNeXt: Aggregated Residual Transformations for Deep Neural
Networks. (Xie et. al. 2016)
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Questions?

More Related Content

What's hot

Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 

What's hot (20)

Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Resnet for image processing (3)
Resnet for image processing (3)Resnet for image processing (3)
Resnet for image processing (3)
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
Cnn method
Cnn methodCnn method
Cnn method
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Cnn
CnnCnn
Cnn
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 

Similar to ResNet basics (Deep Residual Network for Image Recognition)

Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
SERC at Carleton College
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
KamleshKumar394
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
NoorUlHaq47
 
buragadda srikar
buragadda srikarburagadda srikar
buragadda srikar
srikar b
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 

Similar to ResNet basics (Deep Residual Network for Image Recognition) (20)

CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
CS772-Lec1.pptx
CS772-Lec1.pptxCS772-Lec1.pptx
CS772-Lec1.pptx
 
03_Optimization (1).pptx
03_Optimization (1).pptx03_Optimization (1).pptx
03_Optimization (1).pptx
 
A fast fault tolerant architecture for sauvola local image thresholding algor...
A fast fault tolerant architecture for sauvola local image thresholding algor...A fast fault tolerant architecture for sauvola local image thresholding algor...
A fast fault tolerant architecture for sauvola local image thresholding algor...
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)Please don't make me draw (eKnow 2010)
Please don't make me draw (eKnow 2010)
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
buragadda srikar
buragadda srikarburagadda srikar
buragadda srikar
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 

More from Sanjay Saha

More from Sanjay Saha (7)

Face Recognition Basic Terminologies
Face Recognition Basic TerminologiesFace Recognition Basic Terminologies
Face Recognition Basic Terminologies
 
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
 
Convolutional Deep Belief Nets by Lee. H. 2009
Convolutional Deep Belief Nets by Lee. H. 2009Convolutional Deep Belief Nets by Lee. H. 2009
Convolutional Deep Belief Nets by Lee. H. 2009
 
IEEE_802.11e
IEEE_802.11eIEEE_802.11e
IEEE_802.11e
 
Image Degradation & Resoration
Image Degradation & ResorationImage Degradation & Resoration
Image Degradation & Resoration
 
Fault Tree Analysis
Fault Tree AnalysisFault Tree Analysis
Fault Tree Analysis
 
Stack and Queue (brief)
Stack and Queue (brief)Stack and Queue (brief)
Stack and Queue (brief)
 

Recently uploaded

Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 

Recently uploaded (20)

Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 

ResNet basics (Deep Residual Network for Image Recognition)

  • 1. Deep Residual Learning for Image Recognition Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Presented by – Sanjay Saha, School of Computing, NUS CS6240 – Multimedia Analysis – Sem 2 AY2019/20
  • 2. Objective | Problem Statement Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 3. Motivation Performance of plain networks in a deeper architecture Image source: paper Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 4. Main Idea • Skip Connections/ Shortcuts • Trying to avoid: ‘Vanishing Gradients’ ‘Long training times’ Image source: Wikipedia Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 5. Contributions | Problem Statement • These extremely deep residual nets are easy to optimize, but the counterpart “plain” nets (that simply stack layers) exhibit higher training error when the depth increases. • These deep residual nets can easily enjoy accuracy gains from greatly increased depth, producing results substantially better than previous networks. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) A residual learning framework to ease the training of networks that are substantially deeper than those used previously. Perfor mance Depth School of Computing
  • 6. Literature Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 7. Literature Review • Partial solutions for vanishing • Batch Normalization – To rescale the weights over some batch. • Smart Initialization of weights – Like for example Xavier initialization. • Train portions of the network individually. • Highway Networks • Feature residual connections of the form 𝑌 = 𝑓 𝑥 × 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑥 + 𝑏) + 𝑥 × (1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑊𝑥 + 𝑏 ) • Data-dependent gated shortcuts with parameters • When gates are ‘closed’, the layers become ‘non-residual’. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 8. ResNet | Design | Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 9. Plain Block 𝑎[𝑙] 𝑎[𝑙+2] 𝑎[𝑙+1] 𝑧[𝑙+1] = 𝑊[𝑙+1] 𝑎[𝑙] + 𝑏[𝑙+1] “linear” 𝑎[𝑙+1] = 𝑔(𝑧[𝑙+1]) “relu” 𝑧[𝑙+2] = 𝑊[𝑙+2] 𝑎[𝑙+1] + 𝑏[𝑙+2] “output” 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 “relu on output” Image source: deeplearning.ai Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 10. Residual Block 𝑎[𝑙] 𝑎[𝑙+2] 𝑎[𝑙+1] 𝑧[𝑙+1] = 𝑊[𝑙+1] 𝑎[𝑙] + 𝑏[𝑙+1] “linear” 𝑎[𝑙+1] = 𝑔(𝑧[𝑙+1]) “relu” 𝑧[𝑙+2] = 𝑊[𝑙+2] 𝑎[𝑙+1] + 𝑏[𝑙+2] “output” 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙 “relu on output plus input” Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: deeplearning.ai School of Computing
  • 11. Skip Connections • Skipping immediate connections! • Referred to as residual part of the network. • Such residual part receives the input as an amplifier to its output – The dimensions usually are the same. • Another option is to use a projection to the output space. • Either way – no additional training parameters are used. Image source: towardsdatascience.com Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 12. ResNet Architecture Image source: paper Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 13. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper Stacked Residual Blocks School of Computing
  • 14. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper 3x3 conv layers 2x # of filters 2 strides to down-sample Avg. pool after the last conv layer FC layer to output classes School of Computing
  • 15. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper 3x3 conv layers 2x # of filters 2 strides to down-sample Avg. pool after the last conv layer FC layer to output classes School of Computing
  • 16. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper 3x3 conv layers 2x # of filters 2 strides to down-sample Avg. pool after the last conv layer FC layer to output classes School of Computing
  • 17. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) 1x1 conv with 64 filters 28x28x64 Input: 28x28x256 Image source: paper School of Computing
  • 18. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) 1x1 conv with 64 filters 28x28x64 Input: 28x28x256 3x3 conv on 64 feature maps only Image source: paper School of Computing
  • 19. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) 1x1 conv with 64 filters 28x28x64 Input: 28x28x256 3x3 conv on 64 feature maps only 1x1 conv with 256 filters 28x28x256 BOTTLENECK Image source: paper School of Computing
  • 20. Summary | Advantages Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 21. Benefits of Bottleneck • Less training time for deeper networks • By keeping time complexity same as two-layer conv. • Hence, allows to increase # of layers. • And, model converges faster: 152- layer ResNet has 11.3 billion FLOPS while VGG-16/19 nets has 15.3/19.6 billion FLOPS. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Input: 28x28x256 Image source: paper School of Computing
  • 22. Summary – Advantages of ResNet over Plain Networks • A deeper plain network tends to perform bad because of the vanishing and exploding gradients • In such cases, ResNets will stop improving rather than decrease in performance: 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙 = 𝑔(𝑤 𝑙+1 𝑎 𝑙+1 + 𝑏 𝑙 + 𝑎[𝑙]) • If a layer is not ‘useful’, L2 regularization will bring its parameters very close to zero, resulting in 𝑎[𝑙+2] = 𝑔 𝑎[𝑙] = 𝑎[𝑙] (when using ReLU) • In theory, ResNet is still identical to plain networks, but in practice due to the above the convergence is much faster. • No additional training parameters and complexity introduced. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 23. Results Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 24. Results • ILSVRC 2015 classification winner (3.6% top 5 error) -- better than “human performance”! Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Error rates (%) of ensembles. The top-5 error is on the test set of ImageNet and reported by the test server School of Computing
  • 25. Results Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Error rates (%, 10-crop testing) on ImageNet validation set Error rates (%) of single-model results on the ImageNet validation set School of Computing
  • 26. Plain vs. ResNet Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper School of Computing
  • 27. Plain vs. Deeper ResNet Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper School of Computing
  • 28. Conclusion | Future Trends Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 29. Conclusion •Easy to optimize deep neural networks. •Guaranteed Accuracy gain with deeper layers. •Addressed: Vanishing Gradient and Longer Training duration. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 30. Conclusion •Easy to optimize deep neural networks. •Guaranteed Accuracy gain with deeper layers. •Addressed: Vanishing Gradient and Longer Training duration. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 31. Conclusion •Easy to optimize deep neural networks. •Guaranteed Accuracy gain with deeper layers. •Addressed: Vanishing Gradient and Longer Training duration. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 32. Conclusion •Easy to optimize deep neural networks. •Guaranteed Accuracy gain with deeper layers. •Addressed: Vanishing Gradient and Longer Training duration. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 33. Future Trends • Identity Mappings in Deep Residual Networks suggests to pass the input directly to the final residual layer, hence allowing the network to easily learn to pass the input as identity mapping both in forward and backward passes. (He et. al. 2016) • Using the Batch Normalization as pre-activation improves the regularization • Reduce Learning Time with Random Layer Drops • ResNeXt: Aggregated Residual Transformations for Deep Neural Networks. (Xie et. al. 2016) Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 34. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing Questions?