[CVPR2020] Simple but effective image enhancement techniques

Image Enhancement
via a Simple but (very) Effective way
Style transfer your image in “photographic way”, e.g., day2sunset. “CutBlur”: a powerful data augmentation method for various low-level vision.
Jaejun Yoo AI research scientist / Clova AI
Postdoctoral researcher / EPFL
Code, generated images,
and pre-trained models
are all available at
github.com/clovaai/WCT2
github.com/clovaai/cutblur
Leave your contact information
and feedback here. Join Clova !

Image enhancement
Image enhancement is the process of adjusting digital images so that the results are more suitable for
display or further image analysis.
Traditionally…
Low resolution Baseline Proposed
Denoising Super-resolution

Image enhancement (extended)
Image enhancement is the process of adjusting digital images so that the results are more suitable for
display or further image analysis.
Traditionally… + generate (or translate into) an authentic image

• WCT2: Photorealistic Style Transfer via Wavelet Transforms (ICCV’19)
Clearer (authentic) output with 840 times faster and 51% lighter (in memory) model (Current SOTA)
• CutBlur: Rethinking Data Augmentation for Image Super-resolution (CVPR’20)
Current SOTA in Real-world Super-resolution (RealSR)
• SimUSR: A Simple but Strong Baseline for Unsupervised Image Super-resolution (CVPRW’20)
Ranked PSNR 1st and SSIM 2nd in unsupervised Super-resolution Competition (NTIRE)
Contents
Our goal: Solving these problems in a simple and intuitive way while also
achieving huge improvement over the previous SOTA.

WCT2 (ICCV’19)
: A simple correction of the network architecture using wavelets
Clova AI Research 1 Yonsei University 2
Jaejun Yoo1* Youngjung Uh1* Sanghyuk Chun1* Byeongkyu Kang1,2 Jung-Woo Ha1

Gatys et al. CVPR ‘16
using CNN representation (VGG)
Artistic Style Transfer

Neural Style Algorithm
!"
"
# ∗ %
!#
!= '()*(!)
"
"
Noise - Style .
||' ⋅ − '(⋅)||
||' ⋅ − '(⋅)||
||' ⋅ − '(⋅)||
||' ⋅ − '(⋅)||
• Matching Gram matrix

Neural Style Algorithm
Noise ! Style " Content #
||% ⋅ − %(⋅)||, ||(⋅) − (⋅)||
||% ⋅ − %(⋅)||, ||(⋅) − (⋅)||
||% ⋅ − %(⋅)||, ||(⋅) − (⋅)||
||% ⋅ − %(⋅)||, ||(⋅) − (⋅)||
- from Gatys et al., CVPR 2016

Transfer your style in “photographic way”, e.g., day2sunset, day2night, etc.
Photorealistic Style Transfer

Artistic: Whitening and Coloring Transforms (WCT)
orthogonal matrix of eigenvectors
diagonal matrix with the eigenvalues of the covariance matrix
Whitening
centered content feature
- from Yi et al., NIPS 2017

orthogonal matrix of eigenvectors
Coloring
centered style feature
diagonal matrix with the eigenvalues of the covariance matrix

Contents Single level Multi level

PhotoWCT
WCT (artistic model)
: “VGG decoder uses nearest-neighbor upsampling”
PhotoWCT (photorealistic model)
- from Yi et al., ECCV 2018
“Provide decoder the location where the pooling operation happened (unpooling)”
√
√
√
√
√
√
√

PhotoWCT
Content WCT PhotoWCT
“Providing the index to the decoder (unpooling) is actually not that effective?”

PhotoWCT
Content PhotoWCT
+ smoothing
PhotoWCT
* in seconds
“In fact, the post-processing (smoothing) was doing every job.”

Content PhotoWCT
+ smoothing
* in seconds
PhotoWCT
Ours
* Yoo et al., ICCV 2019
“Our new model show better performance even without any post-processing!

WCT via Wavelet Corrected Transforms (WCT2)
To enforce the Encoder-Decoder to learn a function, which has a good property:
1. Should play a similar role to the original pooling (global filter),
2. Shoud not lose the information during the encoding and decoding process,
3. Should be able to represent the features of input images.

* Theoretical motivation: Perfect reconstruction (PR) condition

1
1
1
1
1
-1
1
-1
1
1
-1
-1
1
-1
-1
1
1
2
∗
LL HHHLLH
1
2
∗

- removing multi-level stylization (less error propagation)
1. Better stylization (6 times more preferred by users)
2. Faster model (840 times acceleration than the previous SOTA)
3. Lighter model (51% less memory than the previous SOTA)
4. Stronger model (The only model that can process 1k sized resolution image under 4 sec.)

content
style
content
style
content
style
content
style
5

content
style
content
style
content
style
content
style

“Photorealistic video stylization results (from day-to-sunset). Given
a style and video frames (top), we show the results by WCT2
(middle) and PhotoWCT (bottom) without semantic segmentation
and post-processing. Despite the lack of segmentation map, WCT2
shows photorealistic results while keeping temporal consistency. On
the other hand, PhotoWCT generates spotty and varying artifacts
over frames, which harm the photorealism.”
* Video Style Transfer
Sequential consistency over the frames without imposing
further constraints.

IDEAL CASE
* User study results
40 pairs of content and style
41 subjects
* Computational cost (seconds)
* SSIM index vs. Style loss

Substitute all pooling layers to wavelet filters (wavelet corrected transform)
• Note that this is a general architecture change and not bound to the stylization method!
• You can use our wavelet corrected model with other methods such as AdaIN.
Enjoy the power of lossless image reconstructing network!
• Note that this also opens a new venue to the other applications such as image
restoration tasks! (e.g., denoising, super-resolution, dehazing, etc.)
Summary
Simple solution,
But Effective !

CutBlur (CVPR’20)
: The first data augmentation method for various low-level vision tasks
Clova AI Research 1 Ajou University 2
Jaejun Yoo1* Namhyuk Ahn1,2* Kyung-Ah Shon2

Some spoilers J
• First to provide comprehensive analysis of recent DA’s on Super-resolution (SR)
• A new data augmentation (DA) strategy “CutBlur” is proposed.
• Our method provide consistent and significant improvements in the SR task.
• By just applying our DA method, the model of ‘17 can already achieve the state-of-
the-art (SOTA) performance in RealSR competition
• Last but not least, our method also improves other low-level vision tasks, such as
denoising and compression artifact removal.

Motivation
* Various levels of vision tasks
High-level: Semantic recognition
(e.g., classification, object detection)
Mid-level: Super-pixel
(e.g., segmentation)
Low-level: Pixels, Edges, Colors
(e.g., super-resolution, denoising)
MixUp, Cutout, CutMix …

Motivation
* Various levels of vision tasks
High-level: Semantic recognition
(e.g., classification, object detection)
Mid-level: Super-pixel
(e.g., segmentation)
Low-level: Pixels, Edges, Colors
(e.g., super-resolution, denoising)
?????
MixUp, Cutout, CutMix …

Analysis on existing DA methods
“Sharp transitions, mixed image contents or losing the relationships of pixels can
degrade SR performance.”
e.g., Cutout fails (discarding pixels) and every feature method fails (manipulation).
Training curves when applied feature DA’s

• DA methods in pixel space bring
some improvements when applied
very carefully.

very carefully.
• Cutout:
Original setting (drop 25% of of pixels in a
rectangular shape) significantly degrades the
performance because it erases spatial information
too much. However, erasing tiny amount of pixels
(0.1% random pixels) boosts the performance (2~3
pixels of 48x48 input patch)

very carefully.
• Mixup & CutMix:
Improvements of using CutMix are marginal. We
suspect this happens because CutMix generates a
drastic sharp transition between two different
images.
Improvements of using Mixup is better than
CutMix but it still generates unrealistic image and
affects to the image structure.
Mixup CutMix

very carefully.
• CutMixup:
To verify our hypothesis, we combine benefits of
Mixup and CutMix; CutMixup. CutMixupt
provides various boundary cases while minimizes
the sharp transition by retaining partial cues as
Mixup does.
CutMixup

very carefully.
• Blend & RGB permutation:
To push further, we tried a constant blending and
RGB channel permutation, which turn out to be
very simple but effective strategies showing big
performance enhancement (dB).
Note that both methods do not incur any structure
modification to an image.
BlendRGB perm.

CutBlur
• What does the model learn from CutBlur?
• CutBlur prevents the SR model from over-sharpening an image and helps it to super-resolve only the
necessary region.
Super-resolution results of a model (EDSR) trained without CutBlur and its error residual (Δ)
Error residual (Δ)Output

CutBlur
• What does the model learn from CutBlur?
• CutBlur prevents the SR model from over-sharpening an image and helps it to super-resolve only the
necessary region.
Super-resolution results of a model (EDSR) trained CutBlur and its error residual (Δ)
Error residual (Δ)Output
with

Improved generalization: over-sharpening
• Super-resolution (SR)
• Trained on ×4 scale factor dataset and tested on different scale factor (×2)

Improved generalization: over-smoothing
• Denoising
• Trained on severe noise (! = 70) & tested on mild noise (! = 30).

Improved generalization: over-removal
• JPEG artifact removal
• Trained on a mild compression rate & tested on a severe compression rate

Mixture of Augmentation (MoA)
• During the training phase …
• Randomly select single augmentation at
every step. (among the curated DA list)
• Apply it!

Comparison on diverse benchmark models and datasets
• SRCNN (0.07M) – ECCV’14, CARN (1.14M) – ECCV’18, RCAN (15.6M) – ECCV’18, EDSR (43.1M) – CVPRW’17
• DIV2K (synthetic), RealSR (real-world)
• Our method shows consistent improvement for different models (parameters) and
datasets (different environments and size):

Use CutBlur; cut-and-paste the LQ images to the corresponding HQ images (or vice versa)!
Use our curated list of augmentation methods to further improve the performance!
Mixture of Augmentation
Enjoy the at minimum 0.22dB performance boosting of your model J
+ additional positive side-effects as well
Summary
Simple solution,
But Effective !

SimUSR (CVPRW’20)
: Simple but Strong Baseline for Unsupervised Image Super-resolution
Clova AI Research 1 Ajou University 2
Jaejun Yoo1* Kyung-Ah Shon2Namhyuk Ahn1,2*

Zero-shot SR (ZSSR); previous SOTA
• Tackles the truly unsupervised SR task (no !"# given)
• Use only a single LR image (!$#)
• Runs both training and inference online (runtime)
• Training: Optimize image-specific network using
!$#
%
, !$# , where !$#
%
is downsampled version of !$#
• Inference: Same as the supervised SR
• Pros and Cons:
• ! Single image is required; learns internal statistics
• " Extremely high latency
• " Hard to benefit from a large capacity network

Our method: SimUSR
• Relax the constraints of ZSSR by assuming that
LR images ("#$%
, . . . , "#$(
) are easy to collect
• Train the model offline and inference online
• Generate ("#$*
+
, "#$*
) following the ZSSR
• Now, unsupervised SR turns into the supervised
• Benefits of SimUSR
• ! Low latency
• ! Enjoy every advantages of supervised framework
• Can use any network unlike ZSSR and MZSR [1]
• Can apply data augmentation [2]

Our method: SimUSR
LR images ("#$%
, . . . , "#$(
• Generate ("#$*
+
, "#$*
• ! Low latency
• Yes! We are just doing a supervised learning method at one scale lower space and
relying on its generalizability on different scales.

Our method: SimUSR
LR images ("#$%
, . . . , "#$(
• Generate ("#$*
+
, "#$*
• ! Low latency
• Though this line of study is easy to think of and thus SHOULD HAVE BEEN investigated
prior to any complicated unsupervised methods, surprisingly, there are currently NONE.

Our method: SimUSR
LR images ("#$%
, . . . , "#$(
• Generate ("#$*
+
, "#$*
• ! Low latency
• Even more, this simple method outperforms the SOTA method with a dramatically shorter
latency at runtime, and significantly reduces the gap to the supervised models.

Experiment: Bicubic SR
• To analyze multiple methods simultaneously in the supervised setup
• Compare supervised SR / ZSSR / SimUSR
• For SimUSR, we use CARN / RCAN / EDSR as a backbone
• SimUSR shows large improvement over the ZSSR (Table 1)
• Larger network achieves better performance (e.g. CARN vs. RCAN)
• SimUSR further reduces the gap to supervised SR using augmentation (Table 2)
Note that ours are achieving almost similar performance with the supervised SR.

Experiment: Real-world SR
• Compare ZSSR and our SimUSR on NTIRE 2020 dataset
: We improved the previous SOTA based on our observations
1) ZSSR suffers from noise
→ add BM3D as pre-processing
2) Certain data augmentation harms the performance
→ remove Affine transformation

Experiment: Real-world SR
• Compare ZSSR and our SimUSR on NTIRE 2020 dataset
: We improved the previous SOTA based on our observations
1) ZSSR suffers from noise
→ add BM3D as pre-processing
2) Certain data augmentation harms the performance
→ remove Affine transformation
• Our SimUSR outperforms ZSSR in a huge margin
on both SR performance and latency

Qualitative comparison and competition results
NTIRE 20 Real-world SR challenge
Track 1 (image processing artifact)
• 1st rank of PSNR
• 2nd rank of SSIM
• 13th rank of LPIPS
Bicubic dataset
NTIRE 2020 dataset

Solve the supervised learning problem in the one-scale lower and apply it to the original problem!
Use our data augmentation methods for better generalization J
Enjoy the SOTA performance in the unsupervised SR world J
Summary
Simple solution,
But Effective !

Style transfer your image in “photographic way”, e.g., day2sunset.
“CutBlur”: a powerful data augmentation method for various low-level vision.
That’s all!
Enjoy your CVPR!
github.com/clovaai/WCT2
github.com/clovaai/cutblur
and feedback here. Join Clova!

Other pooling & unpoolings
Analysis

Analysis
Progressive vs. multilevel strategy

Frame-based signal reconstruction
: Perfect reconstruction condition and tight wavelet frames
Theory

[CVPR2020] Simple but effective image enhancement techniques

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [CVPR2020] Simple but effective image enhancement techniques

Similar to [CVPR2020] Simple but effective image enhancement techniques (20)

More from JaeJun Yoo

More from JaeJun Yoo (10)

Recently uploaded

Recently uploaded (20)

[CVPR2020] Simple but effective image enhancement techniques