The role of overparameterization and optimization in CNN denoisers

•Download as PPTX, PDF•

0 likes•96 views

Julián Tachella

TOPML 2021 Conference

Engineering

The deep learning era
2
CNNs offer state of the art image denoising
• Not well understood…
• Can we use the neural tangent kernel [Jacot et al.] insight to improve our understanding?
restored image
DnCNN [Zhang et al., 2017]

The big mystery
3
Deep image prior [Ulyanov et al., 2018]
Network: 2M parameters
Single target image: 49k pixels
“Self-supervised” loss: 𝑧 𝑤 − 𝑦 2
2
Early-stopping consistently provides SOTA
No explicit regularization, highly overparameterized!

4
Image input
Corrupted image
CNN
Autoencoder architecture
Corrupted image
Noise2Self [Batson, 2019]

Image input
5
Assuming overparameterization + GD training
• NTK = non-local filter (e.g. non-local means)
• Patch similarity function in closed form
• Training = iterative image denoising (twicing)
• Leading eigenvectors capture ‘clean’ signal
• Efficient low-rank Nystrom approximation of 𝜂Θ
GD-trained CNN
Time 800 s
Nystrom closed form NTK
Time 3 s

6
Noise input
Corrupted target
White noise
CNN
Autoencoder architecture
Deep image prior [Ulyanov et al., 2018]

7
NTK theory: Filter does not depend on the image in any way…
Vanilla CNN:
This cannot be obtained via low-pass filtering!
The DIP does not use GD, but Adam
Noise input
NTK is low-pass [Heckel, 2020]
DIP Smoothing kernels [Cheng 2019]
Autoencoder:
filter = 1
𝑑
Not well described by NTK as
sup
𝑡
𝑤𝑡 − 𝑤0
2
= 𝒪 1

Experiments
9
Autoencoder, input noise, Adam and GD training vs number of channels

Thanks for your attention!
Tachella.github.io
 Codes
 Presentations
10
To appear at CVPR 2021
https://arxiv.org/abs/2006.02379

What's hot

An Approach for Image Deblurring: Based on Sparse Representation and Regulari...IRJET Journal

AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...mlaij

Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang

A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT sipij

SeRanet introductionKosuke Nakago

Digital image processing - OLDNational Institute of Technology Durgapur

image denoising technique using disctere wavelet transformalishapb

[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-ResolutionTaegyun Jeon

Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondenceNAVER Engineering

Seed net automatic seed generation with deep reinforcement learning for robus...NAVER Engineering

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of MapsNAVER Engineering

The super resolution technology 2016Testo Viet Nam

IRJET- Contrast Enhancement of Grey Level and Color Image using DWT and SVDIRJET Journal

Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Seonho Park

Image classification with Deep Neural NetworksYogendra Tamang

APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENTsipij

Image restoration yogesh 201410048yogesh kumar

AlexNetBertil Hatt

What's hot (18)

An Approach for Image Deblurring: Based on Sparse Representation and Regulari...

AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...

Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015

A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT

SeRanet introduction

Digital image processing - OLD

image denoising technique using disctere wavelet transform

[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution

Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence

Seed net automatic seed generation with deep reinforcement learning for robus...

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

The super resolution technology 2016

IRJET- Contrast Enhancement of Grey Level and Color Image using DWT and SVD

Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...

Image classification with Deep Neural Networks

APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT

Image restoration yogesh 201410048

AlexNet

Similar to The role of overparameterization and optimization in CNN denoisers

MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Modelsmultimediaeval

Classification of Images Using CNN Model and its VariantsIRJET Journal

improving Profile detection using Deep LearningSahil Kaw

From Pixels to Understanding: Deep Learning's Impact on Image Classification ...IRJET Journal

Introduction talk to Computer Vision Chen Sagiv

Enhance and quantify Microstructure using Machine LearningManthan Ambolkar

Image Classification with Deep Learning Techniques and Challenges.pptxMicrosoft azure

APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENTsipij

Inpainting related works (part 2)Seowoo Han

face recognition based on PCA@zenafaris91

Convolutional Neural Network (CNN)Muhammad Haroon

E1102012537IOSR Journals

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Unit v transfer learningDineshBabuAnbalagan

Review Paper on Image Processing TechniquesIJSRD

Batik image retrieval using convolutional neural networkTELKOMNIKA JOURNAL

International Journal of Computational Engineering Research(IJCER)ijceronline

Ip lectures 1 and 2 samarthgec

RESNET MODEL.pptxSamiMahar1

A_Survey_Paper_on_Image_Classification_and_Methods.pdfBijayNag1

Similar to The role of overparameterization and optimization in CNN denoisers (20)

MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models

Classification of Images Using CNN Model and its Variants

improving Profile detection using Deep Learning

From Pixels to Understanding: Deep Learning's Impact on Image Classification ...

Introduction talk to Computer Vision

Enhance and quantify Microstructure using Machine Learning

Image Classification with Deep Learning Techniques and Challenges.pptx

APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT

Inpainting related works (part 2)

face recognition based on PCA

Convolutional Neural Network (CNN)

E1102012537

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

Unit v transfer learning

Review Paper on Image Processing Techniques

Batik image retrieval using convolutional neural network

International Journal of Computational Engineering Research(IJCER)

Ip lectures 1 and 2

RESNET MODEL.pptx

A_Survey_Paper_on_Image_Classification_and_Methods.pdf

Recently uploaded

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

Java Programming :Event Handling(Types of Events)simmis5

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya

Extrusion Processes and Their Limitations120cr0395

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti

data_management_and _data_science_cheat_sheet.pdfJiananWang21

Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth

UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSrknatarajan

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla

NFPA 5000 2024 standard .DerechoLaboralIndivi

Thermal Engineering Unit - I & II . pptDineshKumar4165

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

Unit 1 - Soil Classification and Compaction.pdfRagavanV2

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

Thermal Engineering-R & A / C - unit - VDineshKumar4165

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat

Recently uploaded (20)

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

Java Programming :Event Handling(Types of Events)

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf

Extrusion Processes and Their Limitations

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf

data_management_and _data_science_cheat_sheet.pdf

Call for Papers - International Journal of Intelligent Systems and Applicatio...

UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

NFPA 5000 2024 standard .

Thermal Engineering Unit - I & II . ppt

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

Unit 1 - Soil Classification and Compaction.pdf

UNIT-III FMM. DIMENSIONAL ANALYSIS

Thermal Engineering-R & A / C - unit - V

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...

The role of overparameterization and optimization in CNN denoisers

1. 1 The role of overparameterization and optimization in CNN denoisers Julián Tachella, Junqi Tang and Mike Davies School of Engineering University of Edinburgh TOPML Conference https://arxiv.org/abs/2006.02379

2. The deep learning era 2 CNNs offer state of the art image denoising • Not well understood… • Can we use the neural tangent kernel [Jacot et al.] insight to improve our understanding? restored image DnCNN [Zhang et al., 2017]

3. The big mystery 3 Deep image prior [Ulyanov et al., 2018] Network: 2M parameters Single target image: 49k pixels “Self-supervised” loss: 𝑧 𝑤 − 𝑦 2 2 Early-stopping consistently provides SOTA No explicit regularization, highly overparameterized!

4. 4 Image input Corrupted image CNN Autoencoder architecture Corrupted image Noise2Self [Batson, 2019]

5. Image input 5 Assuming overparameterization + GD training • NTK = non-local filter (e.g. non-local means) • Patch similarity function in closed form • Training = iterative image denoising (twicing) • Leading eigenvectors capture ‘clean’ signal • Efficient low-rank Nystrom approximation of 𝜂Θ GD-trained CNN Time 800 s Nystrom closed form NTK Time 3 s

6. 6 Noise input Corrupted target White noise CNN Autoencoder architecture Deep image prior [Ulyanov et al., 2018]

7. 7 NTK theory: Filter does not depend on the image in any way… Vanilla CNN: This cannot be obtained via low-pass filtering! The DIP does not use GD, but Adam Noise input NTK is low-pass [Heckel, 2020] DIP Smoothing kernels [Cheng 2019] Autoencoder: filter = 1 𝑑 Not well described by NTK as sup 𝑡 𝑤𝑡 − 𝑤0 2 = 𝒪 1

8. Experiments 8

9. Experiments 9 Autoencoder, input noise, Adam and GD training vs number of channels

10. Thanks for your attention! Tachella.github.io  Codes  Presentations 10 To appear at CVPR 2021 https://arxiv.org/abs/2006.02379

Editor's Notes

State of the Art Image denoisers are CNNs Seem to require many weights (500k - 2M) Lots of training data (so are they susceptible to domain shift?) And lots of training time Is this correct? What exactly do they learn?
That is the big mystery The CNN has 2M parameters compared to the images 50K pixels That means the cost function typicaly has a zero error global minima set of ~1.95M dimensions…. Lots of solutions …all of which just output the original noisy image .. So how come early stopping of training consistently provides SOTA performance … and not just in denoising is other image processing problems too such as inpainting
As a simple example let’s consider a CNN with a single hidden layer. The rxr convolutions and nonlinearity calculate a non-local patch based affinity matrix with the following kernel While this patch based similarity metric is different to that of say NLM, if we normalise the patches we can see that it has a very similar form.. (though I have chosen parameters judiciously to maximise the similarity) Ultimately this means we can directly compute the NTK and do filtering
If we now look at what the images looked like we in particular see that GD training with noise as input acts as a crude LPF (particularly for the U-net) In the other cases: noise+Adam or image+GD/Adam we have a good estimate and certainly have not experienced LPF – all images preserve detail structure (Sigma = 25, PSNR = 20.18, CBM3D = 33.03)
Similarly when we look at the change in weights we see broadly what we have predicted… First the L2 change in weights for Adam is order 1 hence not in the NTK regime In contrast for GD weight change decays roughly as C^{-0.5} Next looking at the l_infinity norm of the weights we see that individualy all weights have a change that decays with the number of channels, suggesting that each weight provides a similar small contribution to the solution (in contrast with convolutional sparse coding arguments where only a few weights contribute significantly

The role of overparameterization and optimization in CNN denoisers

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to The role of overparameterization and optimization in CNN denoisers

Similar to The role of overparameterization and optimization in CNN denoisers (20)

More from Julián Tachella

More from Julián Tachella (9)

Recently uploaded

Recently uploaded (20)

The role of overparameterization and optimization in CNN denoisers

Editor's Notes