SlideShare a Scribd company logo
1 of 22
Geon Yeong Park, Sangmin Lee, Sang Wan Lee*, Jong Chul Ye*
KAIST
Training debiased subnetworks with
contrastive weight pruning
Background: Spurious correlation
Training
Cow in grassland Camel in desert
Green background → Cow
Desert background → Camel
Background: Spurious correlation
Deployment
Cow in desert Camel in grassland
This is Camel This is Cow
Background  Dataset bias
Background: Spurious correlation
Dataset Bias
Target
Cow Camel
Grass
Desert
In practice
Ideal
“Shortcut learning”
Shortcut learning: architectural design issue
Bias attribute
Invariant attribute
Biased
Any available channel transmitting the information of 𝒁𝒔𝒑
 Networks would exploit 𝑍𝑠𝑝
𝑍𝑠𝑝
𝑍𝑖𝑛𝑣
Idea: debiased neural pruning
Bias attribute
Invariant attribute
Debiased
Pruning weights on 𝒁𝒔𝒑  Reduce the effective dimension of spurious features
 Improve generalization
𝑍𝑠𝑝
𝑍𝑖𝑛𝑣
How to discover the debiased subnetworks?
Observation 1. Potential limitations of existing algorithms
Training bound Test bound
How to discover the debiased subnetworks?
Observation 1. Potential limitations of existing algorithms
Training bound Test bound
Observation 2. Importance of bias-conflicting samples
New training bound
𝝓 → 𝟏 −
𝟏
𝟐𝒑𝜼
Test bound
Motivating example – Stage 1. pretraining
𝒁𝒊𝒏𝒗
𝒆 𝒁𝒔𝒑,𝟏
𝒆 …
𝒁𝒔𝒑,𝟐
𝒆
𝒁𝒔𝒑,𝑫
𝒆
𝒀𝒆
𝑊𝑖𝑛𝑣(𝑡)
𝑋𝑒
= (𝑍𝑖𝑛𝑣
𝑒
, 𝑍𝑠𝑝
𝑒
)
𝑌𝑒
, 𝑌𝑒
, 𝑍𝑒
∈ {−1, 1}
𝑊(𝑡): Pretrained weights
= 𝒀𝒆
(𝑝𝑟𝑜𝑏 = 1) =
𝒀𝒆
, (𝑝𝑟𝑜𝑏 = 𝑝𝑒
)
−𝒀𝒆
, (𝑝𝑟𝑜𝑏 = 1 − 𝑝𝑒
)
Training
Test = 𝒀𝒆
(𝑝𝑟𝑜𝑏 = 1) =
𝒀𝒆
, (𝑝𝑟𝑜𝑏 = 0.5)
−𝒀𝒆
, (𝑝𝑟𝑜𝑏 = 0.5)
𝑊𝑠𝑝,2(𝑡)
𝑊𝑠𝑝,𝐷(𝑡)
Motivating example – Stage 2. pruning
𝑍𝑖𝑛𝑣
𝑒 𝑍𝑠𝑝,1
𝑒 …
𝑍𝑠𝑝,2
𝑒
𝑍𝑠𝑝,𝐷
𝑒
𝒀𝒆
𝑊𝑖𝑛𝑣(𝑡)
𝑊𝑠𝑝,2(𝑡)
𝑊𝑠𝑝,𝐷(𝑡)
𝝅𝒊𝒏𝒗 𝝅𝒔𝒑,𝟏 𝝅𝒔𝒑,𝟐 𝝅𝒔𝒑,𝑫
…
𝑚𝑖𝑛𝑣
= 1
Pruning parameters:
Probability of preserving weights
𝒎𝒔𝒑,𝟏
= 𝟎
𝑚𝑠𝑝,2
= 1
𝒎𝒔𝒑,𝑫
= 𝟎
⊙ ⊙ ⊙ ⊙
Example of
sampled masks
<Loss function of 𝝅>
Observation 1: difficulty of learning pruning parameters
Theorem 1. (Generalization gap)
ℓ𝑒
𝜋 ≤ 2exp(−
2 𝜋𝑖𝑛𝑣 + 2𝑝𝑒
− 1 𝛼𝑖 𝑡 𝜋𝑠𝑝,𝑖
2
4 𝛼𝑖 𝑡 2 + 1
)
• Assume that 𝑝𝑒
> 1/2 for a given training environment 𝑒 (biased setting). Then the
upper bound of ℓ𝑒
𝜋 is given as
• However, given a test environment 𝑒 with 𝑝𝑒
= 1/2,
ℓ𝑒
𝜋 ≤ 2exp(−
2𝜋𝑖𝑛𝑣
2
4 𝛼𝑖 𝑡 2 + 1
)
TL; DR reliance on 𝑧𝑠𝑝  mismatch of the bounds (Failure of standard pruning algorithms)
Where 𝛼𝑖 𝑡 > 0.
Observation 2: importance of bias-conflicting samples
𝑃𝑚𝑖𝑥
𝜂
𝑍𝑠𝑝,𝑖 𝑌 = 𝑦 = 𝝓𝑷𝒅𝒆𝒃𝒊𝒂𝒔
𝜼
𝑍 𝑌 = 𝑦 + 1 − 𝜙 𝑃𝑏𝑖𝑎𝑠
𝜂
𝑍|𝑌 = 𝑦
• Thm1: Lack of bias-conflicting samples  Preserve spurious weights
• It motivates us to analyze the behavior in another environment 𝜂 where we
can systematically augment bias-conflicting samples
Observation 2: importance of bias-conflicting samples
Theorem 2. (Training bound with the mixture distribution)
ℓ𝜂
𝜋 ≤ 2exp(−
2 𝜋𝑖𝑛𝑣 + 2𝑝𝜂
(1 − 𝜙) − 1 𝛼𝑖 𝑡 𝜋𝑠𝑝,𝑖
2
4 𝛼𝑖 𝑡 2 + 1
)
• Assume that 𝑃𝑚𝑖𝑥
𝜂
is biased. Then, 0 ≤ 𝜙 ≤ 1 −
1
2𝑝𝜂 and
• Furthermore, when 𝜙 = 1 −
1
2𝑝𝜂, the mixture distribution is debiased and
ℓ𝜂
𝜋 ≤ 2exp(−
2𝜋𝑖𝑛𝑣
2
4 𝛼𝑖 𝑡 2 + 1
)
TL; DR Generalization gap is closed by sampling from the true debiasing distribution 𝑃𝑑𝑒𝑏𝑖𝑎𝑠
𝜂
Important clues
𝑃𝑚𝑖𝑥
𝜂
𝑍𝑠𝑝,𝑖 𝑌 = 𝑦
= 𝝓𝑷𝒅𝒆𝒃𝒊𝒂𝒔
𝜼
𝑍 𝑌 = 𝑦 + 1 − 𝜙 𝑃𝑏𝑖𝑎𝑠
𝜂
𝑍|𝑌 = 𝑦
We have to:
• Approximate the unknown 𝑷𝒅𝒆𝒃𝒊𝒂𝒔
𝜼
with
existing samples
• Modify the sampling strategy to simulate
𝑃𝑚𝑖𝑥
𝜂
Objective
ℓ𝑑𝑒𝑏𝑖𝑎𝑠 𝑆; 𝑊, Θ + 𝜆ℓ1
|𝜃𝑙,𝑖|
: Uniform sparsity constraint
ℓ𝑊𝐶𝐸 𝑆; 𝑊, Θ = 𝐸𝑚~𝐺 Θ [𝜆𝑢𝑝ℓ𝑏𝑐 𝑆𝑏𝑐 + ℓ𝑏𝑎(𝑆𝑏𝑎)]
: Oversampling strategy
Model
Biased network
𝐦𝐢𝐧
𝚯
ℓ𝒅𝒆𝒃𝒊𝒂𝒔 + 𝝀ℓ𝟏
|𝚯𝒍,𝒊|
1
Debiased Subnetwork
𝑾 = 𝑾 ⊙ 𝟏(𝚯∗
> 𝟎)
2
Input
3
Finetuning
𝒙𝒂𝒍𝒊𝒈𝒏 ∈ 𝑺𝒃𝒂
𝒙𝒄𝒐𝒏𝒇𝒍𝒊𝒄𝒕 ∈ 𝑺𝒃𝒄
𝒘𝒔𝒑
𝒘𝒊𝒏𝒗
Over-
sampling
𝒁𝒔𝒑
𝒁𝒊𝒏𝒗
Results: Unbiased test accuracy
(a) CMNIST
(b) CIFAR10-C (c) BFFHQ
Results: ablation study
Pruning contributes significantly: (1→2, +7.19%),
(3→5, +11.59%) or (4→6, +8.68%).
The proposed method does not require
weight reset [ref]
[ref]: Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635.
Results: sparsity&sensitivity analysis
Accuracy increases as more (potentially
biased) weights are pruned out
• Trade-off between performance and sparsity does exist
• Proposed framework is reasonably tolerant to high sparsity.
Results: dependency on bias-capturing models
DCWP may perform reasonably well with the limited number and
quality of bias conflicting samples.
Results: visualization
Summary
• Presented a novel functional subnetwork probing method for OOD generalization.
• We provided theoretical insights and empirical evidence to show that the minority
samples provide an important clue for probing the optimal unbiased subnetworks.
• The proposed method is memory efficient and potentially compatible with many other
debiasing methods.

More Related Content

What's hot

การแปรผัน
การแปรผันการแปรผัน
การแปรผัน
bigiga
 
บทที่1 บทนำ
บทที่1 บทนำบทที่1 บทนำ
บทที่1 บทนำ
thanakit553
 
พลวัตความยากจน Dynamics of poverty
พลวัตความยากจน Dynamics of povertyพลวัตความยากจน Dynamics of poverty
พลวัตความยากจน Dynamics of poverty
somporn Isvilanonda
 
ข้อสอบเมทริกซ์
ข้อสอบเมทริกซ์ข้อสอบเมทริกซ์
ข้อสอบเมทริกซ์
K'Keng Hale's
 
ตัวชี้วัดและสาระการเรียนรู้แกนกลาง ม.ปลาย
ตัวชี้วัดและสาระการเรียนรู้แกนกลาง ม.ปลายตัวชี้วัดและสาระการเรียนรู้แกนกลาง ม.ปลาย
ตัวชี้วัดและสาระการเรียนรู้แกนกลาง ม.ปลาย
Aon Narinchoti
 
เรื่องเวกเตอร์คำนวณ
เรื่องเวกเตอร์คำนวณเรื่องเวกเตอร์คำนวณ
เรื่องเวกเตอร์คำนวณ
พัน พัน
 
ข้อสอบยกระดับผลสัมฤทธ์ คณิตศาสตร์ ม1
ข้อสอบยกระดับผลสัมฤทธ์ คณิตศาสตร์ ม1ข้อสอบยกระดับผลสัมฤทธ์ คณิตศาสตร์ ม1
ข้อสอบยกระดับผลสัมฤทธ์ คณิตศาสตร์ ม1
ทับทิม เจริญตา
 
แบบฝึก แฟกทอเรียล N
แบบฝึก แฟกทอเรียล Nแบบฝึก แฟกทอเรียล N
แบบฝึก แฟกทอเรียล N
Oranee Seelopa
 
การแก้ระบบสมการเชิงเส้นสองตัวแปร
การแก้ระบบสมการเชิงเส้นสองตัวแปรการแก้ระบบสมการเชิงเส้นสองตัวแปร
การแก้ระบบสมการเชิงเส้นสองตัวแปร
พัน พัน
 
ปฏิบัติงานการวัดระยะด้วยการนับก้าว
ปฏิบัติงานการวัดระยะด้วยการนับก้าวปฏิบัติงานการวัดระยะด้วยการนับก้าว
ปฏิบัติงานการวัดระยะด้วยการนับก้าว
Nut Seraphim
 

What's hot (20)

31202 final522
31202 final52231202 final522
31202 final522
 
การแปรผัน
การแปรผันการแปรผัน
การแปรผัน
 
Rubber : ยาง
Rubber : ยางRubber : ยาง
Rubber : ยาง
 
บทที่1 บทนำ
บทที่1 บทนำบทที่1 บทนำ
บทที่1 บทนำ
 
พลวัตความยากจน Dynamics of poverty
พลวัตความยากจน Dynamics of povertyพลวัตความยากจน Dynamics of poverty
พลวัตความยากจน Dynamics of poverty
 
ข้อสอบเมทริกซ์
ข้อสอบเมทริกซ์ข้อสอบเมทริกซ์
ข้อสอบเมทริกซ์
 
Relations
RelationsRelations
Relations
 
ตัวชี้วัดและสาระการเรียนรู้แกนกลาง ม.ปลาย
ตัวชี้วัดและสาระการเรียนรู้แกนกลาง ม.ปลายตัวชี้วัดและสาระการเรียนรู้แกนกลาง ม.ปลาย
ตัวชี้วัดและสาระการเรียนรู้แกนกลาง ม.ปลาย
 
เรื่องเวกเตอร์คำนวณ
เรื่องเวกเตอร์คำนวณเรื่องเวกเตอร์คำนวณ
เรื่องเวกเตอร์คำนวณ
 
แผนภาพ
แผนภาพแผนภาพ
แผนภาพ
 
ข้อสอบยกระดับผลสัมฤทธ์ คณิตศาสตร์ ม1
ข้อสอบยกระดับผลสัมฤทธ์ คณิตศาสตร์ ม1ข้อสอบยกระดับผลสัมฤทธ์ คณิตศาสตร์ ม1
ข้อสอบยกระดับผลสัมฤทธ์ คณิตศาสตร์ ม1
 
เวกเตอร์
เวกเตอร์เวกเตอร์
เวกเตอร์
 
Bond
BondBond
Bond
 
แบบฝึก แฟกทอเรียล N
แบบฝึก แฟกทอเรียล Nแบบฝึก แฟกทอเรียล N
แบบฝึก แฟกทอเรียล N
 
Fibonacci
FibonacciFibonacci
Fibonacci
 
การแก้ระบบสมการเชิงเส้นสองตัวแปร
การแก้ระบบสมการเชิงเส้นสองตัวแปรการแก้ระบบสมการเชิงเส้นสองตัวแปร
การแก้ระบบสมการเชิงเส้นสองตัวแปร
 
ข้อสอบคณิต ป6 ปลายภาค1
ข้อสอบคณิต ป6 ปลายภาค1ข้อสอบคณิต ป6 ปลายภาค1
ข้อสอบคณิต ป6 ปลายภาค1
 
ปฏิบัติงานการวัดระยะด้วยการนับก้าว
ปฏิบัติงานการวัดระยะด้วยการนับก้าวปฏิบัติงานการวัดระยะด้วยการนับก้าว
ปฏิบัติงานการวัดระยะด้วยการนับก้าว
 
O-NET ม.6-เลขยกกำลัง (และราก)
O-NET ม.6-เลขยกกำลัง (และราก)O-NET ม.6-เลขยกกำลัง (และราก)
O-NET ม.6-เลขยกกำลัง (และราก)
 
PRML 2.3.2-2.3.4 ガウス分布
PRML 2.3.2-2.3.4 ガウス分布PRML 2.3.2-2.3.4 ガウス分布
PRML 2.3.2-2.3.4 ガウス分布
 

Similar to DCWP_CVPR2023.pptx

Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
butest
 

Similar to DCWP_CVPR2023.pptx (20)

Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
 
Summary of BRAC
Summary of BRACSummary of BRAC
Summary of BRAC
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
 
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
 
Myers_SIAMCSE15
Myers_SIAMCSE15Myers_SIAMCSE15
Myers_SIAMCSE15
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
NIPS KANSAI Reading Group #5: State Aware Imitation Learning
NIPS KANSAI Reading Group #5: State Aware Imitation LearningNIPS KANSAI Reading Group #5: State Aware Imitation Learning
NIPS KANSAI Reading Group #5: State Aware Imitation Learning
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
Capsule networks
Capsule networksCapsule networks
Capsule networks
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
NeurIPS22.pptx
NeurIPS22.pptxNeurIPS22.pptx
NeurIPS22.pptx
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
 
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
MSSISS riBART 20160321
MSSISS riBART 20160321MSSISS riBART 20160321
MSSISS riBART 20160321
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 

Recently uploaded

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
pritamlangde
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Recently uploaded (20)

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 

DCWP_CVPR2023.pptx

  • 1. Geon Yeong Park, Sangmin Lee, Sang Wan Lee*, Jong Chul Ye* KAIST Training debiased subnetworks with contrastive weight pruning
  • 2. Background: Spurious correlation Training Cow in grassland Camel in desert Green background → Cow Desert background → Camel
  • 3. Background: Spurious correlation Deployment Cow in desert Camel in grassland This is Camel This is Cow Background  Dataset bias
  • 4. Background: Spurious correlation Dataset Bias Target Cow Camel Grass Desert In practice Ideal “Shortcut learning”
  • 5. Shortcut learning: architectural design issue Bias attribute Invariant attribute Biased Any available channel transmitting the information of 𝒁𝒔𝒑  Networks would exploit 𝑍𝑠𝑝 𝑍𝑠𝑝 𝑍𝑖𝑛𝑣
  • 6. Idea: debiased neural pruning Bias attribute Invariant attribute Debiased Pruning weights on 𝒁𝒔𝒑  Reduce the effective dimension of spurious features  Improve generalization 𝑍𝑠𝑝 𝑍𝑖𝑛𝑣
  • 7. How to discover the debiased subnetworks? Observation 1. Potential limitations of existing algorithms Training bound Test bound
  • 8. How to discover the debiased subnetworks? Observation 1. Potential limitations of existing algorithms Training bound Test bound Observation 2. Importance of bias-conflicting samples New training bound 𝝓 → 𝟏 − 𝟏 𝟐𝒑𝜼 Test bound
  • 9. Motivating example – Stage 1. pretraining 𝒁𝒊𝒏𝒗 𝒆 𝒁𝒔𝒑,𝟏 𝒆 … 𝒁𝒔𝒑,𝟐 𝒆 𝒁𝒔𝒑,𝑫 𝒆 𝒀𝒆 𝑊𝑖𝑛𝑣(𝑡) 𝑋𝑒 = (𝑍𝑖𝑛𝑣 𝑒 , 𝑍𝑠𝑝 𝑒 ) 𝑌𝑒 , 𝑌𝑒 , 𝑍𝑒 ∈ {−1, 1} 𝑊(𝑡): Pretrained weights = 𝒀𝒆 (𝑝𝑟𝑜𝑏 = 1) = 𝒀𝒆 , (𝑝𝑟𝑜𝑏 = 𝑝𝑒 ) −𝒀𝒆 , (𝑝𝑟𝑜𝑏 = 1 − 𝑝𝑒 ) Training Test = 𝒀𝒆 (𝑝𝑟𝑜𝑏 = 1) = 𝒀𝒆 , (𝑝𝑟𝑜𝑏 = 0.5) −𝒀𝒆 , (𝑝𝑟𝑜𝑏 = 0.5) 𝑊𝑠𝑝,2(𝑡) 𝑊𝑠𝑝,𝐷(𝑡)
  • 10. Motivating example – Stage 2. pruning 𝑍𝑖𝑛𝑣 𝑒 𝑍𝑠𝑝,1 𝑒 … 𝑍𝑠𝑝,2 𝑒 𝑍𝑠𝑝,𝐷 𝑒 𝒀𝒆 𝑊𝑖𝑛𝑣(𝑡) 𝑊𝑠𝑝,2(𝑡) 𝑊𝑠𝑝,𝐷(𝑡) 𝝅𝒊𝒏𝒗 𝝅𝒔𝒑,𝟏 𝝅𝒔𝒑,𝟐 𝝅𝒔𝒑,𝑫 … 𝑚𝑖𝑛𝑣 = 1 Pruning parameters: Probability of preserving weights 𝒎𝒔𝒑,𝟏 = 𝟎 𝑚𝑠𝑝,2 = 1 𝒎𝒔𝒑,𝑫 = 𝟎 ⊙ ⊙ ⊙ ⊙ Example of sampled masks <Loss function of 𝝅>
  • 11. Observation 1: difficulty of learning pruning parameters Theorem 1. (Generalization gap) ℓ𝑒 𝜋 ≤ 2exp(− 2 𝜋𝑖𝑛𝑣 + 2𝑝𝑒 − 1 𝛼𝑖 𝑡 𝜋𝑠𝑝,𝑖 2 4 𝛼𝑖 𝑡 2 + 1 ) • Assume that 𝑝𝑒 > 1/2 for a given training environment 𝑒 (biased setting). Then the upper bound of ℓ𝑒 𝜋 is given as • However, given a test environment 𝑒 with 𝑝𝑒 = 1/2, ℓ𝑒 𝜋 ≤ 2exp(− 2𝜋𝑖𝑛𝑣 2 4 𝛼𝑖 𝑡 2 + 1 ) TL; DR reliance on 𝑧𝑠𝑝  mismatch of the bounds (Failure of standard pruning algorithms) Where 𝛼𝑖 𝑡 > 0.
  • 12. Observation 2: importance of bias-conflicting samples 𝑃𝑚𝑖𝑥 𝜂 𝑍𝑠𝑝,𝑖 𝑌 = 𝑦 = 𝝓𝑷𝒅𝒆𝒃𝒊𝒂𝒔 𝜼 𝑍 𝑌 = 𝑦 + 1 − 𝜙 𝑃𝑏𝑖𝑎𝑠 𝜂 𝑍|𝑌 = 𝑦 • Thm1: Lack of bias-conflicting samples  Preserve spurious weights • It motivates us to analyze the behavior in another environment 𝜂 where we can systematically augment bias-conflicting samples
  • 13. Observation 2: importance of bias-conflicting samples Theorem 2. (Training bound with the mixture distribution) ℓ𝜂 𝜋 ≤ 2exp(− 2 𝜋𝑖𝑛𝑣 + 2𝑝𝜂 (1 − 𝜙) − 1 𝛼𝑖 𝑡 𝜋𝑠𝑝,𝑖 2 4 𝛼𝑖 𝑡 2 + 1 ) • Assume that 𝑃𝑚𝑖𝑥 𝜂 is biased. Then, 0 ≤ 𝜙 ≤ 1 − 1 2𝑝𝜂 and • Furthermore, when 𝜙 = 1 − 1 2𝑝𝜂, the mixture distribution is debiased and ℓ𝜂 𝜋 ≤ 2exp(− 2𝜋𝑖𝑛𝑣 2 4 𝛼𝑖 𝑡 2 + 1 ) TL; DR Generalization gap is closed by sampling from the true debiasing distribution 𝑃𝑑𝑒𝑏𝑖𝑎𝑠 𝜂
  • 14. Important clues 𝑃𝑚𝑖𝑥 𝜂 𝑍𝑠𝑝,𝑖 𝑌 = 𝑦 = 𝝓𝑷𝒅𝒆𝒃𝒊𝒂𝒔 𝜼 𝑍 𝑌 = 𝑦 + 1 − 𝜙 𝑃𝑏𝑖𝑎𝑠 𝜂 𝑍|𝑌 = 𝑦 We have to: • Approximate the unknown 𝑷𝒅𝒆𝒃𝒊𝒂𝒔 𝜼 with existing samples • Modify the sampling strategy to simulate 𝑃𝑚𝑖𝑥 𝜂
  • 15. Objective ℓ𝑑𝑒𝑏𝑖𝑎𝑠 𝑆; 𝑊, Θ + 𝜆ℓ1 |𝜃𝑙,𝑖| : Uniform sparsity constraint ℓ𝑊𝐶𝐸 𝑆; 𝑊, Θ = 𝐸𝑚~𝐺 Θ [𝜆𝑢𝑝ℓ𝑏𝑐 𝑆𝑏𝑐 + ℓ𝑏𝑎(𝑆𝑏𝑎)] : Oversampling strategy
  • 16. Model Biased network 𝐦𝐢𝐧 𝚯 ℓ𝒅𝒆𝒃𝒊𝒂𝒔 + 𝝀ℓ𝟏 |𝚯𝒍,𝒊| 1 Debiased Subnetwork 𝑾 = 𝑾 ⊙ 𝟏(𝚯∗ > 𝟎) 2 Input 3 Finetuning 𝒙𝒂𝒍𝒊𝒈𝒏 ∈ 𝑺𝒃𝒂 𝒙𝒄𝒐𝒏𝒇𝒍𝒊𝒄𝒕 ∈ 𝑺𝒃𝒄 𝒘𝒔𝒑 𝒘𝒊𝒏𝒗 Over- sampling 𝒁𝒔𝒑 𝒁𝒊𝒏𝒗
  • 17. Results: Unbiased test accuracy (a) CMNIST (b) CIFAR10-C (c) BFFHQ
  • 18. Results: ablation study Pruning contributes significantly: (1→2, +7.19%), (3→5, +11.59%) or (4→6, +8.68%). The proposed method does not require weight reset [ref] [ref]: Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635.
  • 19. Results: sparsity&sensitivity analysis Accuracy increases as more (potentially biased) weights are pruned out • Trade-off between performance and sparsity does exist • Proposed framework is reasonably tolerant to high sparsity.
  • 20. Results: dependency on bias-capturing models DCWP may perform reasonably well with the limited number and quality of bias conflicting samples.
  • 22. Summary • Presented a novel functional subnetwork probing method for OOD generalization. • We provided theoretical insights and empirical evidence to show that the minority samples provide an important clue for probing the optimal unbiased subnetworks. • The proposed method is memory efficient and potentially compatible with many other debiasing methods.