G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI

•

0 likes•242 views

MLILAB

Attention Preservation in Network Compression for Reliable Network Interpretation

Technology

Attribution Preservation in Network Compression
for Reliable Network Interpretation
NeurIPS 2020
Geondo Park*, June Yong Yang*, Sung Ju Hwang, Eunho Yang
{geondopark, laoconeth, sjhwang82, eunhoy}@kaist.ac.kr
*Equal contribution.

Safety-Critical Deep Learning
Safety-critical applications require:
• Reliability: Must provide constant uptime.
• Cloud based models are unreliable due to communication failures.
• Need to embed models directly on edge devices.
• Network compression is needed to fit them inside smaller spaces.
• Trust: Provide explanations to its decisions.
• ‘Why’ did the network ran a red light?
• Explainable AI(XAI) algorithms are needed
2
Autonomous Driving X-ray Diagnosis

Network Compression
Network Compression
• Make the network consume less computational(time, space) cost while maintaining
its prediction power
• Why: Embedding DNNs directly in edge devices(reliability)
• Knowledge distillation, Network pruning, Weight Quantization, etc.
3

Explainable AI
Explainable AI(XAI)
• DNNs are practically black boxes
• Produce human-understandable interpretations of decisions and predictions of
neural networks
• Why: Trustworthiness, transparency
• Input attribution, interpretable models, etc.
41) Yulong Wang., Pytorch Visual Attribution 2018, github.com/yulongwang12/visual-attribution

Attribution Deformation Problem
Compression Breaks Attribution
• Compressed networks produce deformed attribution maps compared to their former
selves and the ground truth segmentations.
• Happens across various compression methods and attribution methods
5

Attribution Matching to Preserve Attribution
• While compressing, make the attribution map of the compressing network follow the
map of the pre-compression network
• Employ a matching loss to keep the maps of the compressing network close to the
pre-compression network
6

Weight Collapsed Attribution Matching
• Generate attribution maps by collapsing them in the channel dimension
7
Equally weighted Sensitivity weighted
⋅⋅⋅
1
1
1
1
𝐴1
𝐴2
𝐴 𝐶−1
𝐴 𝐶
⋅⋅⋅
𝐴1
𝐴2
𝐴 𝐶−1
𝐴 𝐶
𝑈1(0.04)
𝑈2(0.1)
𝑈 𝐶−1(0.6)
𝑈 𝐶(0.14)

Stochastic Matching
• Generate attribution maps by collapsing them in the channel dimension
8
Stochastic sensitivity weighted
⋅⋅⋅
𝐴1
𝐴2
𝐴 𝐶−1
𝐴 𝐶
𝑈1 ⋅ 𝑅1(0.04)
𝑈2 ⋅ 𝑅2 (0)
𝑈 𝐶−1 ⋅ 𝑅 𝑐−1
𝑈 𝐶 ⋅ 𝑅 𝐶 (0.14)

Knowledge Distillation
Basic knowledge distillation
• Naïve Knowledge Distillation causes attribution map distortion
• Our framework effectively preserves the attribution map similarly to the teacher,
which in turn preserves attribution against ground truth.
• Attribution preservation also helps in preserving the predictive performance.
9

Unstructured Pruning
Weight magnitude based pruning
• Since unstructured pruning is relatively tolerable, attribution distortion occurs less
than other compression methods.
• Similar phenomena were observed: attribution score and predictive performance is
preserved.
10

Structured Pruning
L1-magnitude based pruning
• Similar phenomena were seen in the structured pruning method. As the structured
pruning prune with the unit of channels, attribution distortion occurs more than
unstructured pruning.
• Our framework also help preserving the attribution maps with structured pruning.
11

Conclusion
• Discovered that naïve network compression causes the Attribution Deformation
Problem, which has not been considered before.
• Proposed and constructed the Attribution Map Matching framework, which enforces
the attribution maps of the compressed network to follow the pre-compression
network.
• Experiments show that our method indeed preserves various kinds of attribution.
Also, our method yields gains in terms of predictive performance.
14

What's hot

Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...JaeJun Yoo

Pillar k meansswathi b

PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee

Efficient Neural Architecture Search via Parameter SharingJinwon Lee

Uncertainty aware multidimensional ensemble data visualization and explorationSubhashis Hazarika

Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu

[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper reviewtaeseon ryu

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Soheila Dehghanzadeh

Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...Soheila Dehghanzadeh

PR243: Designing Network Design SpacesJinwon Lee

PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee

presentation 2019 04_09_rev1Hyun Wong Choi

PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee

PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...taeseon ryu

PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee

PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee

PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee

CenterForDomainSpecificComputing-PosterYunming Zhang

PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee

What's hot (20)

Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...

Pillar k means

PR-197: One ticket to win them all: generalizing lottery ticket initializatio...

Efficient Neural Architecture Search via Parameter Sharing

Uncertainty aware multidimensional ensemble data visualization and exploration

Restricting the Flow: Information Bottlenecks for Attribution

[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...

Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...

PR243: Designing Network Design Spaces

PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...

presentation 2019 04_09_rev1

PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector

PR-183: MixNet: Mixed Depthwise Convolutional Kernels

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...

PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...

PR-217: EfficientDet: Scalable and Efficient Object Detection

PR-144: SqueezeNext: Hardware-Aware Neural Network Design

CenterForDomainSpecificComputing-Poster

PR-155: Exploring Randomly Wired Neural Networks for Image Recognition

Similar to G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI

Cvpr 2018 papers review (efficient computing)DonghyunKang12

Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Yan Xu

ResNet.pptxssuser2624f71

Declarative data analysisSouth West Data Meetup

Improving Hardware Efficiency for DNN ApplicationsChester Chen

ResNet.pptxssuser2624f71

Towards better analysis of deep convolutional neural networks曾子芸

Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325

Mx net image segmentation to predict and diagnose the cardiac diseases karp...KannanRamasamy25

Revisiting the Calibration of Modern Neural NetworksSungchul Kim

NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...ssuser4b1f48

Augmix review [cdm]Dongmin Choi

Topological Data AnalysisDeviousQuant

Wits presentation 6_28072015Beatrice van Eden

Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen

Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev

Computer vision-nit-silchar-hackathonAditya Bhattacharya

Convolutional neural network Yan Xu

generalized_nbody_acs_2015_challacombeMatt Challacombe

(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...e8xu

Similar to G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI (20)

Cvpr 2018 papers review (efficient computing)

Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...

ResNet.pptx

Declarative data analysis

Improving Hardware Efficiency for DNN Applications

ResNet.pptx

Towards better analysis of deep convolutional neural networks

Types of Machine Learnig Algorithms(CART, ID3)

Mx net image segmentation to predict and diagnose the cardiac diseases karp...

Revisiting the Calibration of Modern Neural Networks

NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...

Augmix review [cdm]

Topological Data Analysis

Wits presentation 6_28072015

Machine Learning Foundations for Professional Managers

Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...

Computer vision-nit-silchar-hackathon

Convolutional neural network

generalized_nbody_acs_2015_challacombe

(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

How to convert PDF to text with Nanonetsnaman860154

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Scaling API-first – The story of a global engineering organizationRadu Cotescu

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Developing An App To Navigate The Roads of BrazilV3cube

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Histor y of HAM Radio presentation slidevu2urc

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

How to convert PDF to text with Nanonets

Unblocking The Main Thread Solving ANRs and Frozen Frames

Exploring the Future Potential of AI-Enabled Smartphone Processors

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Scaling API-first – The story of a global engineering organization

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Salesforce Community Group Quito, Salesforce 101

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Developing An App To Navigate The Roads of Brazil

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Presentation on how to chat with PDF using ChatGPT code interpreter

Histor y of HAM Radio presentation slide

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

A Domino Admins Adventures (Engage 2024)

Finology Group – Insurtech Innovation Award 2024

G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI

1. Attribution Preservation in Network Compression for Reliable Network Interpretation NeurIPS 2020 Geondo Park*, June Yong Yang*, Sung Ju Hwang, Eunho Yang {geondopark, laoconeth, sjhwang82, eunhoy}@kaist.ac.kr *Equal contribution.

2. Safety-Critical Deep Learning Safety-critical applications require: • Reliability: Must provide constant uptime. • Cloud based models are unreliable due to communication failures. • Need to embed models directly on edge devices. • Network compression is needed to fit them inside smaller spaces. • Trust: Provide explanations to its decisions. • ‘Why’ did the network ran a red light? • Explainable AI(XAI) algorithms are needed 2 Autonomous Driving X-ray Diagnosis

3. Network Compression Network Compression • Make the network consume less computational(time, space) cost while maintaining its prediction power • Why: Embedding DNNs directly in edge devices(reliability) • Knowledge distillation, Network pruning, Weight Quantization, etc. 3

4. Explainable AI Explainable AI(XAI) • DNNs are practically black boxes • Produce human-understandable interpretations of decisions and predictions of neural networks • Why: Trustworthiness, transparency • Input attribution, interpretable models, etc. 41) Yulong Wang., Pytorch Visual Attribution 2018, github.com/yulongwang12/visual-attribution

5. Attribution Deformation Problem Compression Breaks Attribution • Compressed networks produce deformed attribution maps compared to their former selves and the ground truth segmentations. • Happens across various compression methods and attribution methods 5

6. Attribution Matching to Preserve Attribution • While compressing, make the attribution map of the compressing network follow the map of the pre-compression network • Employ a matching loss to keep the maps of the compressing network close to the pre-compression network 6

7. Weight Collapsed Attribution Matching • Generate attribution maps by collapsing them in the channel dimension 7 Equally weighted Sensitivity weighted ⋅⋅⋅ 1 1 1 1 𝐴1 𝐴2 𝐴 𝐶−1 𝐴 𝐶 ⋅⋅⋅ 𝐴1 𝐴2 𝐴 𝐶−1 𝐴 𝐶 𝑈1(0.04) 𝑈2(0.1) 𝑈 𝐶−1(0.6) 𝑈 𝐶(0.14)

8. Stochastic Matching • Generate attribution maps by collapsing them in the channel dimension 8 Stochastic sensitivity weighted ⋅⋅⋅ 𝐴1 𝐴2 𝐴 𝐶−1 𝐴 𝐶 𝑈1 ⋅ 𝑅1(0.04) 𝑈2 ⋅ 𝑅2 (0) 𝑈 𝐶−1 ⋅ 𝑅 𝑐−1 𝑈 𝐶 ⋅ 𝑅 𝐶 (0.14)

9. Knowledge Distillation Basic knowledge distillation • Naïve Knowledge Distillation causes attribution map distortion • Our framework effectively preserves the attribution map similarly to the teacher, which in turn preserves attribution against ground truth. • Attribution preservation also helps in preserving the predictive performance. 9

10. Unstructured Pruning Weight magnitude based pruning • Since unstructured pruning is relatively tolerable, attribution distortion occurs less than other compression methods. • Similar phenomena were observed: attribution score and predictive performance is preserved. 10

11. Structured Pruning L1-magnitude based pruning • Similar phenomena were seen in the structured pruning method. As the structured pruning prune with the unit of channels, attribution distortion occurs more than unstructured pruning. • Our framework also help preserving the attribution maps with structured pruning. 11

12. Sample Images(Structured Pruning) 12

13. Conclusion • Discovered that naïve network compression causes the Attribution Deformation Problem, which has not been considered before. • Proposed and constructed the Attribution Map Matching framework, which enforces the attribution maps of the compressed network to follow the pre-compression network. • Experiments show that our method indeed preserves various kinds of attribution. Also, our method yields gains in terms of predictive performance. 14

G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI

Similar to G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI (20)

More from MLILAB

More from MLILAB (18)

Recently uploaded

Recently uploaded (20)

G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI