Neur ips yomikai_at_ridgei_aaron_jan312020

Review of Filos et al. (2019):
“A Systematic Comparison of Bayesian Deep
Learning Robustness in Diabetic Retinopathy”
NeurIPSよみかい＠Ridge-i
January 31st 2020
Aaron C. Bell, Engineer

• Paper PDF:
• http://bayesiandeeplearning.org/2019/paper
s/12.pdf

The context：A major problem with DL

...and a major bottleneck in BDL
● UCI “toy” datasets limit research in BDL:

● UCI “toy” datasets limit research in BDL:
● Yachts, Wine, Concrete, Energy….

● UCI “toy” datasets...

● UCI “toy” datasets… are they too easy?
The context：A major problem with DL BDL?

What is Bayesian deep learning?
● Extension of Bayesian methods to deep learning
○ Taking account of prior information to
○ Getting robust uncertainties on predictions
● Allows us to ask:
○ How powerful are your results… really?
○ Is higher accuracy really a significant result?

● Extension of Bayesian methods to neural networks

○ Taking account of prior information

● Extension of Bayesian methods to neural networks
○ Allows DL to be applied in real-world applications where uncertainties are critical

○ Taking account of prior information to
■ Opens door to DL use for scientific hypothesis testing

The paper’s objectives:
1) Widen the bottleneck in BDL --- provide a better benchmark dataset (than
UCI)
1) Show off the strong points of BDL --- argue a specific, challenging, real-world
example where BDL is needed, medical diagnosis.

The paper’s objectives:
1) Widen the bottleneck in BDL --- provide a better benchmark dataset (than
UCI)

A better benchmark dataset for BDL
● Step 1: Choose an existing dataset that’s suited for BDL’s strengths:
○ 1) Highly dimensional
○ 2) Large dataset
○ 3) Requiring more complex models
● Step 2: Enhance suitability for BDL benchmarking
○ 1) Pre-process the dataset.
○ 2) Develop API for benchmarking.

● Step 1: Choose an existing dataset that’s suited for BDL’s strengths:
○ 1) Highly dimensional
○ 2) Large number of examples
○ 3) Requiring more complex models
● Step 2: Enhance suitability for BDL benchmarking
○ 1) Pre-process the dataset.
○ 2) Develop API for benchmarking.

● Step 1:
Choose an existing highly
dimensional, large
dataset….
Diabetic retinopathy
“fundus” images
(Kaggle dataset)

● Step 1: Choose an existing highly dimensional, large dataset….
Diabetic retinopathy (DR) “fundus” images (Kaggle dataset)

● Step 2: Pre-process the dataset:
○ Redefine the problem… 5-classes of diabetic retinopathy (DR) to Binary
0: No DR
1: Mild DR
2: Moderate DR
3: Severe DR
4. Critical DR
0: Sight not in
danger
1: Sight in danger

● Step 2: Pre-process the dataset:
○ Augment data: Make it challenging enough for BDL.

Objective 2)Show an example where BDL is needed
● Giving predictions with uncertainties
● Informing medical diagnosis
● Streamlining patient referrals

Objective 2) Show an example where BDL is needed
Automatic Final
Diagnosis

Objective 2) Show an example where BDL is needed
Automatic Final
Diagnosis
Referral to
“real” doctor

Method: Compare four BDL techniques
● Bayesian Neural Networks:
○ 1) Mean-field variational inference (MFVI)
○ 2) Monte Carlo Dropout (MC Dropout)

Four methods to compare..

● 3) Model Ensembling --- “Deep Ensemble”

● 3) Model Ensembling -- “Deep Ensemble”
● 4) Combine (2) and (3) -- “Ensemble MC Dropout”

● 3) Model Ensembling
● 4) Combine (2) and (3)
● 5*) Deterministic baseline

Bayesian Neural Networks
● 1) Mean-field Variational Inference
● 2) Monte Carlo Dropout

● 2) Monte-Carlo Dropout

3) Model Ensembling
● No special training or inference techniques.

3) Model Ensembling
● Just train a bunch of models in parallel, with different ICs

3) Model Ensembling
● Can be combined with MC Dropout

4) Ensemble MC Dropout
● An ensemble of MC dropout networks
MC simulation
dropout
applied during
test time

Naive Baselines
● Deterministic
● Random

The state of the art...
Is MFVI really the best BDL technique?

The state of the art… SPOILER WARNING

● UCI (easy) benchmarks: “Yes”

● UCI (easy) benchmarks: “Yes”
● This paper (hard) benchmark: “No”

Comparison of Various Approaches: Data retention

In-domain
(Kaggle DR)

In-domain
(Kaggle DR)
Out-of-domain
(India blindness
detection dataset)

In-domain
(Kaggle DR)
Out-of-domain
(India blindness
detection dataset)
All models
converge on full
dataset… (within
std error bar)
Uncertainty
comparison is
fair.

Ensemble MC Dropout
Always Performs best at
50% data retention

Major conclusions...
● Over use of UCI may have misled the BDL community.

● Harder benchmarks give a better picture of BDL method performance

● Harder benchmarks give a better picture of BDL method performance
● BDL methods are suited for cases where uncertainty is critical for the
downstream decision task… (medical diagnosis, re-evaluation.

Neur ips yomikai_at_ridgei_aaron_jan312020

Recommended

Recommended

More Related Content

Similar to Neur ips yomikai_at_ridgei_aaron_jan312020

Similar to Neur ips yomikai_at_ridgei_aaron_jan312020 (20)

Recently uploaded

Recently uploaded (20)

Neur ips yomikai_at_ridgei_aaron_jan312020

Editor's Notes