SlideShare a Scribd company logo
1 of 16
Download to read offline
제목
펀디멘탈팀
고형권, 김동희, 김창연, 송헌, 이민경, 이재윤
2021.02.07
Restricting the Flow: Information Bottlenecks for Attribution
ICLR 20’
1
딥러닝읽기모임
딥러닝읽기모임
2
- What is attribution map?: Providing insights of the DNNs’ decision-making
Understanding Deep Neural Networks
Input Image
How to “explain” the network prediction to Dog or Cat?
 Find the region where network is looking at. (Visual Explanation)
DNNs
Dog
Cat
딥러닝읽기모임
3
- Visualizing Attribution Maps
Visual Explanation
White-box Approach
- Pros: Simple and Fast.
- Cons: Need tractable internal components
(e.g., gradient, activation), Network
architecture dependency.
Black-box approach
- Pros: Model-agnostic property,
Interpretability for the black-box models.
- Cons: difficult to optimize.
[Selvaraju et al. 2017; Fong et al. 2017]
딥러닝읽기모임
4
- Restricting the flow: Information Bottlenecks for Attribution (ICLR 20’)
Introduction
Existing attribution heatmap highlights subjectively irrelevant areas and this might
correctly reflect the network’s unexpected way of processing the data
 So, the authors proposed a novel attribution method that estimates the amount
of information an image region provides to the network’s prediction using
information bottleneck concept
딥러닝읽기모임
5
- Information Bottleneck:
Preliminaries
[Tishby et al. 2000]
max 𝐼𝐼 𝑌𝑌; 𝑍𝑍 − 𝛽𝛽𝛽𝛽[𝑋𝑋; 𝑍𝑍]
New Random Variable
Control the trade-off between predicting the
labels well and using little information of 𝑋𝑋
Label Input
Goal: minimizing information flow + maximizing original model objective
Common way to reduce the amount of information?
𝑍𝑍 = 𝜆𝜆 𝑋𝑋 𝑅𝑅 + 1 − 𝜆𝜆 𝑋𝑋 𝜖𝜖, 𝜆𝜆𝑖𝑖 ∈ [0,1]
𝑅𝑅 = 𝑓𝑓𝑙𝑙 X  𝑙𝑙th layer output
𝜖𝜖~𝒩𝒩(𝜇𝜇𝑅𝑅, 𝜎𝜎𝑅𝑅
2
)mean, var as 𝑅𝑅
1. 𝜆𝜆𝑖𝑖 𝑋𝑋 = 1  Transmit all information (𝑍𝑍𝑖𝑖 = 𝑅𝑅𝑖𝑖)
2. 𝜆𝜆𝑖𝑖 𝑋𝑋 = 0  All information in 𝑅𝑅𝑖𝑖 is lost (𝑍𝑍𝑖𝑖 = 𝜖𝜖)
딥러닝읽기모임
6
- Information Bottleneck for attribution:
Proposed Method
Now, it is required to estimate how much information is contained in 𝑍𝑍 from 𝑅𝑅
Variational approximation 𝑄𝑄 𝑍𝑍 = 𝒩𝒩(𝜇𝜇𝑅𝑅, 𝜎𝜎𝑅𝑅) (Assumption : All dim of 𝑍𝑍 are distributed normally and independent.).
𝐼𝐼 𝑅𝑅, 𝑍𝑍 = 𝔼𝔼𝑅𝑅[𝐷𝐷𝐾𝐾𝐾𝐾[𝑃𝑃(𝑍𝑍|𝑅𝑅)‖𝑃𝑃(𝑍𝑍)]]
𝑝𝑝 𝑧𝑧 = ∫ 𝑝𝑝 𝑧𝑧|𝑟𝑟 𝑝𝑝 𝑟𝑟 𝑑𝑑𝑑𝑑  intractable
𝐼𝐼 𝑅𝑅, 𝑍𝑍 = � 𝑝𝑝 𝑅𝑅 � 𝑝𝑝 𝑧𝑧|𝑟𝑟 log
𝑝𝑝 𝑧𝑧|𝑟𝑟
𝑝𝑝 𝑧𝑧
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
= � � 𝑝𝑝(𝑟𝑟, 𝑧𝑧) log
𝑝𝑝 𝑧𝑧|𝑟𝑟
𝑝𝑝 𝑧𝑧
𝑞𝑞(𝑧𝑧)
𝑞𝑞(𝑧𝑧)
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
= � � 𝑝𝑝(𝑟𝑟, 𝑧𝑧) log
𝑝𝑝 𝑧𝑧|𝑟𝑟
𝑞𝑞 𝑧𝑧
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + � � 𝑝𝑝(𝑟𝑟, 𝑧𝑧) log
𝑞𝑞 𝑧𝑧
𝑝𝑝 𝑧𝑧
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
= � � 𝑝𝑝(𝑟𝑟, 𝑧𝑧) log
𝑝𝑝 𝑧𝑧|𝑟𝑟
𝑞𝑞 𝑧𝑧
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + � 𝑝𝑝 𝑧𝑧 � 𝑝𝑝 𝑟𝑟|𝑧𝑧 𝑑𝑑𝑑𝑑 log
𝑞𝑞(𝑧𝑧)
𝑝𝑝(𝑧𝑧)
𝑑𝑑𝑑𝑑
= 𝔼𝔼𝑅𝑅 𝐷𝐷𝐾𝐾𝐾𝐾 𝑃𝑃 𝑍𝑍|𝑅𝑅 ‖𝑄𝑄 𝑍𝑍 − 𝐷𝐷𝐾𝐾𝐾𝐾[𝑃𝑃(𝑍𝑍)‖𝑄𝑄(𝑍𝑍)]
≤ 𝔼𝔼𝑅𝑅 𝐷𝐷𝐾𝐾𝐾𝐾 𝑃𝑃 𝑍𝑍|𝑅𝑅 ‖𝑄𝑄 𝑍𝑍
[Klambauer et al. 2017]
딥러닝읽기모임
7
- Information Bottleneck for attribution:
Proposed Method
𝐼𝐼 𝑅𝑅, 𝑍𝑍 = 𝔼𝔼𝑅𝑅 𝐷𝐷𝐾𝐾𝐾𝐾 𝑃𝑃 𝑍𝑍|𝑅𝑅 ‖𝑄𝑄 𝑍𝑍 − 𝐷𝐷𝐾𝐾𝐾𝐾[𝑃𝑃(𝑍𝑍)‖𝑄𝑄(𝑍𝑍)]
ℒ𝐼𝐼 ≥ 𝐼𝐼[𝑅𝑅, 𝑍𝑍]
If ℒ𝐼𝐼 = 0 for an area  information from this area is not necessary for the network’s prediction
Their goal is keeping only the information necessary for correct classification. Thus, the mutual
information should be minimal while the classification score remain high.
Total objective function:
ℒ = ℒ𝐶𝐶𝐶𝐶 + 𝛽𝛽ℒ𝐼𝐼
𝛽𝛽 controls the relative importance of both objectives. (e.g., for a small 𝛽𝛽, more bits of
information are flowing and less for a higher 𝛽𝛽)
[Alemi et al. 2017]
딥러닝읽기모임
8
- Per-Sample Bottleneck
Proposed Method
Parameterization
: The bottleneck parameters 𝜆𝜆 have to be in [0,1]. Therefore, they parametrize 𝜆𝜆 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝛼𝛼), where 𝛼𝛼 ∈ ℝ𝑑𝑑
.
Initialization
: In the beginning, they wanted retain all the information  initialize 𝛼𝛼𝑖𝑖 = 5, thus 𝜆𝜆 ≈ 0.993 ⟹ 𝑍𝑍 ≈ 𝑅𝑅.
At first, the bottleneck has practically no impact on the model performance. It then deviates from this starting point to
suppress unimportant regions.
Optimization
: 10 iters using Adam with lr=1 to fit the mask 𝛼𝛼. To stabilize the training, they copy the single sample 10 times and apply
different noise to each.
딥러닝읽기모임
9
- Per-Sample Bottleneck
Proposed Method
Measure of information in 𝑍𝑍 (i.e., 𝐷𝐷𝐾𝐾𝐾𝐾(𝑃𝑃(𝑍𝑍|𝑅𝑅)‖𝑄𝑄(𝑍𝑍))) per dimension)
: summing over the channel axis  𝑚𝑚 ℎ,𝑤𝑤 = ∑𝑖𝑖=0
𝑐𝑐
𝐷𝐷𝐾𝐾𝐾𝐾(𝑃𝑃(𝑍𝑍 𝑖𝑖,ℎ,𝑤𝑤 �𝑅𝑅 𝑖𝑖,ℎ,𝑤𝑤 )�𝑄𝑄(𝑍𝑍 𝑖𝑖,ℎ,𝑤𝑤 ))
Enforcing local smoothness
: Pooling and conv stride ignore parts of the input, causing PSB overfit to a grid structure
 Convolve the sigmoid output with a fixed Gaussian kernel with standard deviation 𝜎𝜎𝑠𝑠.
𝜆𝜆 = 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏(𝜎𝜎𝑠𝑠, 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝛼𝛼))
딥러닝읽기모임
10
- Readout Bottleneck
Proposed Method
Collect feature maps from different depths and then combine them with 1X1 conv
① In a first forward pass, no noise is added and collect different feature maps and interpolate them bilinearly
to match the spatial dimension.
② In a second forward pass, they insert the bottleneck layer into the network and restrict the flow of
information.
딥러닝읽기모임
11
- Qualitative Assessment
Evaluation
Subjectively, both the PSB, RB identify areas relevant to the classification well and more specific (fewer pixels are scored high.).
딥러닝읽기모임
12
- Qualitative Assessment
Evaluation
딥러닝읽기모임
13
- Sanity Check (Randomization of Model Parameters)
Evaluation
: Starting from the last layer, an increasing proportion of the network parameters is re-initialized until all parameters are
random. The difference between original heatmap and the heatmap obtained from the randomized model is quantified using
SSIM.
For their methods, the randomizing the final dense layer drops the mean SSIM by around 0.4
[Adebayo et al. 2018]
딥러닝읽기모임
14
- Sensitivity-N
Evaluation
: Masks the network’s input randomly and then measures how strongly the amount of attribution in the mask correlates with
the drop in classifier score.
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 �
𝑖𝑖∈𝑇𝑇𝑛𝑛
𝑅𝑅𝑖𝑖 𝑥𝑥 , 𝑆𝑆𝑐𝑐 𝑥𝑥 − 𝑆𝑆𝑐𝑐 𝑥𝑥 𝑥𝑥𝑇𝑇𝑛𝑛=0
Classifier logit output for class 𝑐𝑐
The input with all pixels in 𝑇𝑇𝑛𝑛 set to zero
PSB (𝛽𝛽 = 10/𝑘𝑘) perform best for both models above 𝑛𝑛 = 2 ⋅ 103
pixels (i.e., when more than 2% of all pixels are masked).
[Anacona et al. 2018]
딥러닝읽기모임
15
- Localization
Evaluation
1. Bounding Box: If the bbox contains 𝑛𝑛 pixels, measuring how many of 𝑛𝑛-th highest scored pixels are contained in the
bounding box (Then divided by 𝑛𝑛 = ratio).
2. Image Degradation: The most relevant tiles are removed first (MoRF) ⟹ Removing tiles ranked as least relevant by the
attribution method first (LeRF).
𝑠𝑠 𝑥𝑥 =
𝑝𝑝(𝑦𝑦|𝑥𝑥) − 𝑏𝑏
𝑡𝑡1 − 𝑏𝑏
Top-1 probability on the original samples
Mean model output on the
fully degraded images.
Both LeRf, MoRF degradation yield curves measuring
different qualities of the attribution method.
 calculate the integral between the two curves
제목
Thank You
Q&A
16
딥러닝읽기모임
펀디멘탈팀
고형권, 김동희, 김창연, 송헌, 이민경, 이재윤
2021.02.07
arkimjh@naver.com

More Related Content

What's hot

PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingJinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Dongmin Choi
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnDebarko De
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkNamHyuk Ahn
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...JaeJun Yoo
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksAndrew Ferlitsch
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Taiji Suzuki
 

What's hot (20)

PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural Network
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
 
Neural networks
Neural networksNeural networks
Neural networks
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...
 

Similar to Restricting the Flow: Information Bottlenecks for Attribution

Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...IRJET Journal
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcscpconf
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructioncsandit
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxNoorUlHaq47
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfPolytechnique Montréal
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...TELKOMNIKA JOURNAL
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introductionSungminYou
 

Similar to Restricting the Flow: Information Bottlenecks for Attribution (20)

Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...
 
2021 05-04-u2-net
2021 05-04-u2-net2021 05-04-u2-net
2021 05-04-u2-net
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
AI Lesson 39
AI Lesson 39AI Lesson 39
AI Lesson 39
 
Lesson 39
Lesson 39Lesson 39
Lesson 39
 
Eye deep
Eye deepEye deep
Eye deep
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
 
All projects
All projectsAll projects
All projects
 
DA FDAFDSasd
DA FDAFDSasdDA FDAFDSasd
DA FDAFDSasd
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
 
Data miningpresentation
Data miningpresentationData miningpresentation
Data miningpresentation
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Recently uploaded

Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
DIENES _ Organic Chemistry _ B. Pharm..pptx
DIENES _ Organic Chemistry _  B. Pharm..pptxDIENES _ Organic Chemistry _  B. Pharm..pptx
DIENES _ Organic Chemistry _ B. Pharm..pptxAZCPh
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfAtiaGohar1
 

Recently uploaded (20)

Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
DIENES _ Organic Chemistry _ B. Pharm..pptx
DIENES _ Organic Chemistry _  B. Pharm..pptxDIENES _ Organic Chemistry _  B. Pharm..pptx
DIENES _ Organic Chemistry _ B. Pharm..pptx
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
 

Restricting the Flow: Information Bottlenecks for Attribution

  • 1. 제목 펀디멘탈팀 고형권, 김동희, 김창연, 송헌, 이민경, 이재윤 2021.02.07 Restricting the Flow: Information Bottlenecks for Attribution ICLR 20’ 1 딥러닝읽기모임
  • 2. 딥러닝읽기모임 2 - What is attribution map?: Providing insights of the DNNs’ decision-making Understanding Deep Neural Networks Input Image How to “explain” the network prediction to Dog or Cat?  Find the region where network is looking at. (Visual Explanation) DNNs Dog Cat
  • 3. 딥러닝읽기모임 3 - Visualizing Attribution Maps Visual Explanation White-box Approach - Pros: Simple and Fast. - Cons: Need tractable internal components (e.g., gradient, activation), Network architecture dependency. Black-box approach - Pros: Model-agnostic property, Interpretability for the black-box models. - Cons: difficult to optimize. [Selvaraju et al. 2017; Fong et al. 2017]
  • 4. 딥러닝읽기모임 4 - Restricting the flow: Information Bottlenecks for Attribution (ICLR 20’) Introduction Existing attribution heatmap highlights subjectively irrelevant areas and this might correctly reflect the network’s unexpected way of processing the data  So, the authors proposed a novel attribution method that estimates the amount of information an image region provides to the network’s prediction using information bottleneck concept
  • 5. 딥러닝읽기모임 5 - Information Bottleneck: Preliminaries [Tishby et al. 2000] max 𝐼𝐼 𝑌𝑌; 𝑍𝑍 − 𝛽𝛽𝛽𝛽[𝑋𝑋; 𝑍𝑍] New Random Variable Control the trade-off between predicting the labels well and using little information of 𝑋𝑋 Label Input Goal: minimizing information flow + maximizing original model objective Common way to reduce the amount of information? 𝑍𝑍 = 𝜆𝜆 𝑋𝑋 𝑅𝑅 + 1 − 𝜆𝜆 𝑋𝑋 𝜖𝜖, 𝜆𝜆𝑖𝑖 ∈ [0,1] 𝑅𝑅 = 𝑓𝑓𝑙𝑙 X  𝑙𝑙th layer output 𝜖𝜖~𝒩𝒩(𝜇𝜇𝑅𝑅, 𝜎𝜎𝑅𝑅 2 )mean, var as 𝑅𝑅 1. 𝜆𝜆𝑖𝑖 𝑋𝑋 = 1  Transmit all information (𝑍𝑍𝑖𝑖 = 𝑅𝑅𝑖𝑖) 2. 𝜆𝜆𝑖𝑖 𝑋𝑋 = 0  All information in 𝑅𝑅𝑖𝑖 is lost (𝑍𝑍𝑖𝑖 = 𝜖𝜖)
  • 6. 딥러닝읽기모임 6 - Information Bottleneck for attribution: Proposed Method Now, it is required to estimate how much information is contained in 𝑍𝑍 from 𝑅𝑅 Variational approximation 𝑄𝑄 𝑍𝑍 = 𝒩𝒩(𝜇𝜇𝑅𝑅, 𝜎𝜎𝑅𝑅) (Assumption : All dim of 𝑍𝑍 are distributed normally and independent.). 𝐼𝐼 𝑅𝑅, 𝑍𝑍 = 𝔼𝔼𝑅𝑅[𝐷𝐷𝐾𝐾𝐾𝐾[𝑃𝑃(𝑍𝑍|𝑅𝑅)‖𝑃𝑃(𝑍𝑍)]] 𝑝𝑝 𝑧𝑧 = ∫ 𝑝𝑝 𝑧𝑧|𝑟𝑟 𝑝𝑝 𝑟𝑟 𝑑𝑑𝑑𝑑  intractable 𝐼𝐼 𝑅𝑅, 𝑍𝑍 = � 𝑝𝑝 𝑅𝑅 � 𝑝𝑝 𝑧𝑧|𝑟𝑟 log 𝑝𝑝 𝑧𝑧|𝑟𝑟 𝑝𝑝 𝑧𝑧 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 = � � 𝑝𝑝(𝑟𝑟, 𝑧𝑧) log 𝑝𝑝 𝑧𝑧|𝑟𝑟 𝑝𝑝 𝑧𝑧 𝑞𝑞(𝑧𝑧) 𝑞𝑞(𝑧𝑧) 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = � � 𝑝𝑝(𝑟𝑟, 𝑧𝑧) log 𝑝𝑝 𝑧𝑧|𝑟𝑟 𝑞𝑞 𝑧𝑧 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + � � 𝑝𝑝(𝑟𝑟, 𝑧𝑧) log 𝑞𝑞 𝑧𝑧 𝑝𝑝 𝑧𝑧 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = � � 𝑝𝑝(𝑟𝑟, 𝑧𝑧) log 𝑝𝑝 𝑧𝑧|𝑟𝑟 𝑞𝑞 𝑧𝑧 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + � 𝑝𝑝 𝑧𝑧 � 𝑝𝑝 𝑟𝑟|𝑧𝑧 𝑑𝑑𝑑𝑑 log 𝑞𝑞(𝑧𝑧) 𝑝𝑝(𝑧𝑧) 𝑑𝑑𝑑𝑑 = 𝔼𝔼𝑅𝑅 𝐷𝐷𝐾𝐾𝐾𝐾 𝑃𝑃 𝑍𝑍|𝑅𝑅 ‖𝑄𝑄 𝑍𝑍 − 𝐷𝐷𝐾𝐾𝐾𝐾[𝑃𝑃(𝑍𝑍)‖𝑄𝑄(𝑍𝑍)] ≤ 𝔼𝔼𝑅𝑅 𝐷𝐷𝐾𝐾𝐾𝐾 𝑃𝑃 𝑍𝑍|𝑅𝑅 ‖𝑄𝑄 𝑍𝑍 [Klambauer et al. 2017]
  • 7. 딥러닝읽기모임 7 - Information Bottleneck for attribution: Proposed Method 𝐼𝐼 𝑅𝑅, 𝑍𝑍 = 𝔼𝔼𝑅𝑅 𝐷𝐷𝐾𝐾𝐾𝐾 𝑃𝑃 𝑍𝑍|𝑅𝑅 ‖𝑄𝑄 𝑍𝑍 − 𝐷𝐷𝐾𝐾𝐾𝐾[𝑃𝑃(𝑍𝑍)‖𝑄𝑄(𝑍𝑍)] ℒ𝐼𝐼 ≥ 𝐼𝐼[𝑅𝑅, 𝑍𝑍] If ℒ𝐼𝐼 = 0 for an area  information from this area is not necessary for the network’s prediction Their goal is keeping only the information necessary for correct classification. Thus, the mutual information should be minimal while the classification score remain high. Total objective function: ℒ = ℒ𝐶𝐶𝐶𝐶 + 𝛽𝛽ℒ𝐼𝐼 𝛽𝛽 controls the relative importance of both objectives. (e.g., for a small 𝛽𝛽, more bits of information are flowing and less for a higher 𝛽𝛽) [Alemi et al. 2017]
  • 8. 딥러닝읽기모임 8 - Per-Sample Bottleneck Proposed Method Parameterization : The bottleneck parameters 𝜆𝜆 have to be in [0,1]. Therefore, they parametrize 𝜆𝜆 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝛼𝛼), where 𝛼𝛼 ∈ ℝ𝑑𝑑 . Initialization : In the beginning, they wanted retain all the information  initialize 𝛼𝛼𝑖𝑖 = 5, thus 𝜆𝜆 ≈ 0.993 ⟹ 𝑍𝑍 ≈ 𝑅𝑅. At first, the bottleneck has practically no impact on the model performance. It then deviates from this starting point to suppress unimportant regions. Optimization : 10 iters using Adam with lr=1 to fit the mask 𝛼𝛼. To stabilize the training, they copy the single sample 10 times and apply different noise to each.
  • 9. 딥러닝읽기모임 9 - Per-Sample Bottleneck Proposed Method Measure of information in 𝑍𝑍 (i.e., 𝐷𝐷𝐾𝐾𝐾𝐾(𝑃𝑃(𝑍𝑍|𝑅𝑅)‖𝑄𝑄(𝑍𝑍))) per dimension) : summing over the channel axis  𝑚𝑚 ℎ,𝑤𝑤 = ∑𝑖𝑖=0 𝑐𝑐 𝐷𝐷𝐾𝐾𝐾𝐾(𝑃𝑃(𝑍𝑍 𝑖𝑖,ℎ,𝑤𝑤 �𝑅𝑅 𝑖𝑖,ℎ,𝑤𝑤 )�𝑄𝑄(𝑍𝑍 𝑖𝑖,ℎ,𝑤𝑤 )) Enforcing local smoothness : Pooling and conv stride ignore parts of the input, causing PSB overfit to a grid structure  Convolve the sigmoid output with a fixed Gaussian kernel with standard deviation 𝜎𝜎𝑠𝑠. 𝜆𝜆 = 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏(𝜎𝜎𝑠𝑠, 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝛼𝛼))
  • 10. 딥러닝읽기모임 10 - Readout Bottleneck Proposed Method Collect feature maps from different depths and then combine them with 1X1 conv ① In a first forward pass, no noise is added and collect different feature maps and interpolate them bilinearly to match the spatial dimension. ② In a second forward pass, they insert the bottleneck layer into the network and restrict the flow of information.
  • 11. 딥러닝읽기모임 11 - Qualitative Assessment Evaluation Subjectively, both the PSB, RB identify areas relevant to the classification well and more specific (fewer pixels are scored high.).
  • 13. 딥러닝읽기모임 13 - Sanity Check (Randomization of Model Parameters) Evaluation : Starting from the last layer, an increasing proportion of the network parameters is re-initialized until all parameters are random. The difference between original heatmap and the heatmap obtained from the randomized model is quantified using SSIM. For their methods, the randomizing the final dense layer drops the mean SSIM by around 0.4 [Adebayo et al. 2018]
  • 14. 딥러닝읽기모임 14 - Sensitivity-N Evaluation : Masks the network’s input randomly and then measures how strongly the amount of attribution in the mask correlates with the drop in classifier score. 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 � 𝑖𝑖∈𝑇𝑇𝑛𝑛 𝑅𝑅𝑖𝑖 𝑥𝑥 , 𝑆𝑆𝑐𝑐 𝑥𝑥 − 𝑆𝑆𝑐𝑐 𝑥𝑥 𝑥𝑥𝑇𝑇𝑛𝑛=0 Classifier logit output for class 𝑐𝑐 The input with all pixels in 𝑇𝑇𝑛𝑛 set to zero PSB (𝛽𝛽 = 10/𝑘𝑘) perform best for both models above 𝑛𝑛 = 2 ⋅ 103 pixels (i.e., when more than 2% of all pixels are masked). [Anacona et al. 2018]
  • 15. 딥러닝읽기모임 15 - Localization Evaluation 1. Bounding Box: If the bbox contains 𝑛𝑛 pixels, measuring how many of 𝑛𝑛-th highest scored pixels are contained in the bounding box (Then divided by 𝑛𝑛 = ratio). 2. Image Degradation: The most relevant tiles are removed first (MoRF) ⟹ Removing tiles ranked as least relevant by the attribution method first (LeRF). 𝑠𝑠 𝑥𝑥 = 𝑝𝑝(𝑦𝑦|𝑥𝑥) − 𝑏𝑏 𝑡𝑡1 − 𝑏𝑏 Top-1 probability on the original samples Mean model output on the fully degraded images. Both LeRf, MoRF degradation yield curves measuring different qualities of the attribution method.  calculate the integral between the two curves
  • 16. 제목 Thank You Q&A 16 딥러닝읽기모임 펀디멘탈팀 고형권, 김동희, 김창연, 송헌, 이민경, 이재윤 2021.02.07 arkimjh@naver.com