Review: [SIGIR'22]Interpolative Distillation for Unifying Biased and Debiased Recommendation

•Download as PPTX, PDF•

0 likes•79 views

CS Kwak

Review ppt of Interpolative Distillation for Unifying Biased and Debiased Recommendation

Technology

Interpolative Distillation for Unifying Biased
and Debiased Recommendation
SIGIR’22, Sihao Ding(USTC) et al.
POSTECH DI Lab
Presenter: Changsoo Kwak
2022.5.24
1

Motivation
2
▪ Most recommender system’s test set for evaluate
▪ Normal biased test set(𝐷𝑏)
▪ Debiased test set (𝐷𝑑)
[1] Self-supervised Graph Learning for Recommendation, Jiancan Wu(USTC) et al, SIGIR’21
Existing models didn’t perform well on both test set
Biased or Unbiased model
Only reflect part of whole picture

Intuitive solution?
3
▪ Unifying 𝐷𝑏, 𝐷𝑑
▪ Usually 𝐷𝑏 ≫ |𝐷𝑑|
▪ Train two models for 𝐷𝑏, 𝐷𝑑 respectively, and ensemble
▪ Unclear that each models are strong/weak at which type of users/items
▪ Existing ensemble strategies are not tailored for win-win recommendation scenario
▪ Possible solution?
▪ Distillation!
▪ Aggregate two models at the level of user-item pair
Determine coefficient automatically for distillation

Proposed model(InterD)
4
Environment 𝐸 ∈ {𝑒𝑏, 𝑒𝑑}
Probability of environment given user-item pair
Existing models only consider one environment
- Only achieve good performance on one of 𝐷𝑏 or 𝐷𝑑
Predicted rating with given environment assumption
Let student model learns predicted ratings generated by
fine-grained weighted sum of prediction of pre-trained
models, considering environment

Proposed model(InterD)
5
𝑓𝑏, 𝑓𝑑: Pre-trained biased/unbiased model
▪ Estimate 𝑃(𝑅|𝑈, 𝐼, 𝐸)
▪ Directly use prediction of 𝑓𝑏, 𝑓𝑑
▪ Estimate 𝑃 𝐸 𝑈, 𝐼
𝑤𝑏 =
𝐿𝑏(𝑟𝑏, 𝑟)𝛾1
𝐿𝑏(𝑟𝑏, 𝑟)𝛾1+𝐿𝑑(𝑟𝑑, 𝑟)𝛾1
, 𝑤𝑑 =
𝐿𝑑(𝑟𝑑, 𝑟)𝛾1
𝐿𝑏(𝑟𝑏, 𝑟)𝛾1+𝐿𝑑(𝑟𝑑, 𝑟)𝛾1
𝐿𝑏: MSE, 𝐿𝑑: IPS weighted MSE, 𝛾1: Negative hyperparameter
𝑃 𝑅 𝑈, 𝐼 =
𝐸
𝑃 𝑅 𝑈, 𝐼, 𝐸 𝑃 𝐸 𝑈, 𝐼 = 𝑟∗ = 𝑤𝑏 𝑟𝑏 + 𝑤𝑑𝑟𝑑
▪ Training student model
Distillation loss 𝐿𝑂 =
1
|𝐷𝑏| + |𝐷𝑑|
(𝑢,𝑖,𝑟)∈𝐷𝑏∪𝐷𝑑
𝐿(𝑟, 𝑟∗ )

Proposed model(InterD)
6
▪ Incorporate unobserved data 𝐷𝑛 = 𝑈 × 𝐼 − 𝐷𝑏 ∪ 𝐷𝑑
𝑤𝑏
′
=
𝐿𝑏(𝑟𝑏, 𝑟)𝛾2
𝐿𝑏(𝑟𝑏, 𝑟)𝛾2+𝐿𝑑(𝑟𝑑, 𝑟)𝛾2
, 𝑤𝑑
′
=
𝐿𝑑(𝑟𝑑, 𝑟)𝛾2
𝐿𝑏(𝑟𝑏, 𝑟)𝛾2+𝐿𝑑(𝑟𝑑, 𝑟)𝛾2
𝑟∗
′
= 𝑤𝑏
′
𝑟𝑏 + 𝑤𝑑
′
𝑟𝑑
Imputation distillation loss 𝐿𝑁 =
1
|𝐷𝑛|
(𝑢,𝑖)∈𝐷𝑛
𝐿(𝑟, 𝑟∗
′)
Student model learn more from closer teacher over unobserved data

Similar to Review: [SIGIR'22]Interpolative Distillation for Unifying Biased and Debiased Recommendation

NeurIPS22.pptxJulián Tachella

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo

Aaa ped-14-Ensemble Learning: About Ensemble LearningAminaRepo

Lec05.pptxHassanAhmad442087

DCWP_CVPR2023.pptx건영 박

Ensemble methods zekeLabs Technologies

adversarial robustness lectureMuhammadAhmedShah2

Distributional RL via Moment Matchingtaeseon ryu

Deep learning paper review ppt sourece -Direct clr taeseon ryu

Machine learning - session 3Luis Borbon

Conistency of random forestsHoang Nguyen

ddpg seminar민재 정

Week 13 Feature Selection Computer Vision Bagian 2khairulhuda242

GTC 2021: Counterfactual Learning to Rank in E-commerceGrubhubTech

Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi

I2b2 2008University of Minnesota, Duluth

Basic Concepts of Standard Experimental Designs ( Statistics )Hasnat Israq

Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsJisang Yoon

MACHINE LEARNING.pptxSOURAVGHOSH623569

BaggingBoosting.pdfDynamicPitch

Similar to Review: [SIGIR'22]Interpolative Distillation for Unifying Biased and Debiased Recommendation (20)

NeurIPS22.pptx

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...

Aaa ped-14-Ensemble Learning: About Ensemble Learning

Lec05.pptx

DCWP_CVPR2023.pptx

Ensemble methods

adversarial robustness lecture

Distributional RL via Moment Matching

Deep learning paper review ppt sourece -Direct clr

Machine learning - session 3

Conistency of random forests

ddpg seminar

Week 13 Feature Selection Computer Vision Bagian 2

GTC 2021: Counterfactual Learning to Rank in E-commerce

Learning a nonlinear embedding by preserving class neibourhood structure 최종

I2b2 2008

Basic Concepts of Standard Experimental Designs ( Statistics )

Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments

MACHINE LEARNING.pptx

BaggingBoosting.pdf

Recently uploaded

Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance

Top 10 CodeIgniter Development CompaniesTopCSSGallery

Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance

Google I/O Extended 2024 WarsawGDSC PJATK

Portal Kombat : extension du réseau de propagande russe中央社

ADP Passwordless Journey Case Study.pptxFIDO Alliance

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance

WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero

Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55

Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB

Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek

Overview of Hyperledger FoundationHyperleger Tokyo Meetup

How to Check CNIC Information Online with Pakdata cfdanishmna97

الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam

AI mind or machine power point presentationyogeshlabana357357

JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37

2024 May Patch TuesdayIvanti

Generative AI Use Cases and Applications.pdfalexjohnson7307

Recently uploaded (20)

Intro to Passkeys and the State of Passwordless.pptx

Top 10 CodeIgniter Development Companies

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx

Google I/O Extended 2024 Warsaw

Portal Kombat : extension du réseau de propagande russe

ADP Passwordless Journey Case Study.pptx

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx

WebRTC and SIP not just audio and video @ OpenSIPS 2024

Oauth 2.0 Introduction and Flows with MuleSoft

Event-Driven Architecture Masterclass: Challenges in Stream Processing

Easier, Faster, and More Powerful – Notes Document Properties Reimagined

Six Myths about Ontologies: The Basics of Formal Ontology

ChatGPT and Beyond - Elevating DevOps Productivity

Overview of Hyperledger Foundation

How to Check CNIC Information Online with Pakdata cf

الأمن السيبراني - ما لا يسع للمستخدم جهله

AI mind or machine power point presentation

JohnPollard-hybrid-app-RailsConf2024.pptx

2024 May Patch Tuesday

Generative AI Use Cases and Applications.pdf

Review: [SIGIR'22]Interpolative Distillation for Unifying Biased and Debiased Recommendation

1. Interpolative Distillation for Unifying Biased and Debiased Recommendation SIGIR’22, Sihao Ding(USTC) et al. POSTECH DI Lab Presenter: Changsoo Kwak 2022.5.24 1

2. Motivation 2 ▪ Most recommender system’s test set for evaluate ▪ Normal biased test set(𝐷𝑏) ▪ Debiased test set (𝐷𝑑) [1] Self-supervised Graph Learning for Recommendation, Jiancan Wu(USTC) et al, SIGIR’21 Existing models didn’t perform well on both test set Biased or Unbiased model Only reflect part of whole picture

3. Intuitive solution? 3 ▪ Unifying 𝐷𝑏, 𝐷𝑑 ▪ Usually 𝐷𝑏 ≫ |𝐷𝑑| ▪ Train two models for 𝐷𝑏, 𝐷𝑑 respectively, and ensemble ▪ Unclear that each models are strong/weak at which type of users/items ▪ Existing ensemble strategies are not tailored for win-win recommendation scenario ▪ Possible solution? ▪ Distillation! ▪ Aggregate two models at the level of user-item pair Determine coefficient automatically for distillation

4. Proposed model(InterD) 4 Environment 𝐸 ∈ {𝑒𝑏, 𝑒𝑑} Probability of environment given user-item pair Existing models only consider one environment - Only achieve good performance on one of 𝐷𝑏 or 𝐷𝑑 Predicted rating with given environment assumption Let student model learns predicted ratings generated by fine-grained weighted sum of prediction of pre-trained models, considering environment

5. Proposed model(InterD) 5 𝑓𝑏, 𝑓𝑑: Pre-trained biased/unbiased model ▪ Estimate 𝑃(𝑅|𝑈, 𝐼, 𝐸) ▪ Directly use prediction of 𝑓𝑏, 𝑓𝑑 ▪ Estimate 𝑃 𝐸 𝑈, 𝐼 𝑤𝑏 = 𝐿𝑏(𝑟𝑏, 𝑟)𝛾1 𝐿𝑏(𝑟𝑏, 𝑟)𝛾1+𝐿𝑑(𝑟𝑑, 𝑟)𝛾1 , 𝑤𝑑 = 𝐿𝑑(𝑟𝑑, 𝑟)𝛾1 𝐿𝑏(𝑟𝑏, 𝑟)𝛾1+𝐿𝑑(𝑟𝑑, 𝑟)𝛾1 𝐿𝑏: MSE, 𝐿𝑑: IPS weighted MSE, 𝛾1: Negative hyperparameter 𝑃 𝑅 𝑈, 𝐼 = 𝐸 𝑃 𝑅 𝑈, 𝐼, 𝐸 𝑃 𝐸 𝑈, 𝐼 = 𝑟∗ = 𝑤𝑏 𝑟𝑏 + 𝑤𝑑𝑟𝑑 ▪ Training student model Distillation loss 𝐿𝑂 = 1 |𝐷𝑏| + |𝐷𝑑| (𝑢,𝑖,𝑟)∈𝐷𝑏∪𝐷𝑑 𝐿(𝑟, 𝑟∗ )

6. Proposed model(InterD) 6 ▪ Incorporate unobserved data 𝐷𝑛 = 𝑈 × 𝐼 − 𝐷𝑏 ∪ 𝐷𝑑 𝑤𝑏 ′ = 𝐿𝑏(𝑟𝑏, 𝑟)𝛾2 𝐿𝑏(𝑟𝑏, 𝑟)𝛾2+𝐿𝑑(𝑟𝑑, 𝑟)𝛾2 , 𝑤𝑑 ′ = 𝐿𝑑(𝑟𝑑, 𝑟)𝛾2 𝐿𝑏(𝑟𝑏, 𝑟)𝛾2+𝐿𝑑(𝑟𝑑, 𝑟)𝛾2 𝑟∗ ′ = 𝑤𝑏 ′ 𝑟𝑏 + 𝑤𝑑 ′ 𝑟𝑑 Imputation distillation loss 𝐿𝑁 = 1 |𝐷𝑛| (𝑢,𝑖)∈𝐷𝑛 𝐿(𝑟, 𝑟∗ ′) Student model learn more from closer teacher over unobserved data

7. Experiments 7

8. Experiments 8

Editor's Notes

RCT: Randomized Control Trial(https://books.google.co.kr/books?id=JUTqDwAAQBAJ&pg=PA244&lpg=PA244&dq=yahoo!r3+randomized+controlled+trial&source=bl&ots=0cagKMc4KG&sig=ACfU3U3oFb-FZsxO3PuYDFYRz6gX9O97tA&hl=ko&sa=X&ved=2ahUKEwj5qp-psev3AhWim1YBHfVgC2QQ6AF6BAgDEAM#v=onepage&q=yahoo!r3%20randomized%20controlled%20trial&f=false)
In other words, the student tends to learn the easier aspects of knowledge since the smaller distance makes it easier to follow the corresponding teacher 학생 입장에서 더 쉬운 쪽(거리가 적은 쪽 teacher)을 따라가기 때문에 curriculum learning으로 볼 수도 있다? Weight 계산에 student prediction이 들어가니까 self-paced learning으,로 볼 수도 있다?

Review: [SIGIR'22]Interpolative Distillation for Unifying Biased and Debiased Recommendation

Recommended

Recommended

More Related Content

Similar to Review: [SIGIR'22]Interpolative Distillation for Unifying Biased and Debiased Recommendation

Similar to Review: [SIGIR'22]Interpolative Distillation for Unifying Biased and Debiased Recommendation (20)

Recently uploaded

Recently uploaded (20)

Review: [SIGIR'22]Interpolative Distillation for Unifying Biased and Debiased Recommendation

Editor's Notes