SlideShare a Scribd company logo
Uncoupled Regression from
Comparison Data
Liyuan Xu
Gatsby Unit@UCL, Former AIP member
(Twitter: @ly9988)
Disclaimer
This talk is mainly based on our paper in NeurIPS2019
Introduction
Regression Problem
(x1, y1), (x2, y2), …
(Coupled) Data
∼ PXY
f(X) ≃ 𝔼[Y|X]
Learn
Correspondence in data is assumed
Uncoupled Regression Problem
Uncoupled Data
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
Regression without data correspondence
Uncoupled Regression
Uncoupled regression is impossible itself.
→What is a practically feasible assumption?
Application of Uncoupled Regression
• Merging two datasets [Carpentier+, 2016]
• : income, housing priceX Y :
Government
Publish X
Bank
Publish Y
How to merge two datasets
collected independently?
Application of Uncoupled Regression
• Privacy Preserving Machine Learning [Xu et al. 2019]
• Consider contains sensitive informationY
(Xi, Yi)
Security Incident
Application of Uncoupled Regression
• Privacy Preserving Machine Learning [Xu et al. 2019]
• Consider contains sensitive informationY
Xi Yi
Anonymized Data
Data Fusion / Matching
Uncoupled Data w. Context
∼ PXZ
(x1, z1), (x2, z2), …
∼ PYZ
(y1, z′1), (y2, z′2), …
f(X) ≃ 𝔼[Y|X]
Learn
Use contextual data to merge two distributions
→ Data Fusion / Matching
Z
Isometric Uncoupled Regression [Carpentier+, 2016]
Uncoupled Data
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
Assuming
𝔼[Y|X] : monotonic
Monotonicity makes uncoupled regression feasible
Isometric Uncoupled Regression [Carpentier+, 2016]
• Advantage
• Consistency is proved [Rigollet et al. 2018]
→ Optimal model can be learn as data increases
• Limitation
• Monotonicity assumption may be too strong
• Is really income monotonic to housing price ?
• Only applicable to the case
• Need to know the noise distribution
• Solve problem with with known
X Y
X ∈ ℝ
Y = f*(x) + ε P(ε)
High-level concept
Message in [Carpentier+, 2016]
Uncoupled Data + Order Info. → Regression
Order info is provided by monotonic assumption
Our Idea
Order info is learned from pairwise comparison data
Uncoupled Data + Order Info. → Regression
Problem Setting
• Pairwise Comparison Data
• Originally considered in ranking context
• Sample two data points
• Obtain Pairwise Comparison Data as
(X, Y), (X′, Y′) ∼ PX,Y
(X+
, X−
)
{
X+
= X, X−
= X′ (if Y > Y′)
X+
= X′, X−
= X (if Y ≤ Y′)
Uncoupled Regression from Pairwise Comparison
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
∼ PX+,X−
(x+
1 , x−
1 ), (x+
2 , x−
2 ), …
Uncoupled Data Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison
Proposes two approaches:
Risk Approximation & Target Transformation
• Advantage
• Put no assumption on
• Need not to know noise distribution
• Limitation
• Not consistent
• Deviation from optimal model is bounded
• Empirically it works
𝔼[Y|X]
Risk Approximation Approach
Formal Problem Settings
• Data Given:
• Unlabeled Data:
• Target Set:
• Pairwise Comparison Data:
• Goal: Find that satisfies
DX = {x1, x2, …, xn} ∼ PX
DY = {y1, y2, …, yn} ∼ PY
DX+,X− = {(x+
1 , x−
1 ), …, (x+
m, x−
m)} ∼ PX+,X−
f*
f* = arg min
f
R(f ), R(f ) = 𝔼[(f(X) − Y)2
]
Risk Approximation
Loss Decomposition
R(f ) = 𝔼X,Y[(f(X) − Y)2
]
= 𝔼X[f2
(X)] − 2𝔼X,Y[Yf(X)] + const .
Estimated from unlabeled data DX
Approx. by
linear combination of and
𝔼X,Y[Yf(X)]
𝔼X+[f(X+
)] 𝔼X−[f(X−
)]
Risk Approximation
Lemma 1 [Xu et al. 2019]
For any function ,f
𝔼X+[f(X+
)] = 2𝔼X,Y[FY(Y)f(X)]
𝔼X−[f(X−
)] = 2𝔼X,Y[(1 − FY(Y))f(X)],
where is CDF ofFY Y
If we can learn such thatw1, w2
Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))
then,
𝔼XY[Yf(X)] ≃ w1 𝔼X+[f(X+
)] + w2 𝔼X−[f(X−
)]
Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f
Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f
CDF is estimated viaFY
Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f
Weight is learned bŷw1, ̂w2
̂w1, ̂w2 = arg min
|DY|
∑
i=1
(yi − 2w1
̂FY(yi) − 2w2(1 − ̂FY(yi)))
2
Recall, we want Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))
Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f
Model is learned byf
̂f = arg min
f
1
|DX |
|DX|
∑
i=1
f(xi)2
−
2
|DX+,X− |
|DX+,X−|
∑
j=1
̂w1f(x+
j ) + ̂w2 f(x−
j )
𝔼X[f2
(X)] 2𝔼XY[Yf(X)]
Theoretical Property
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Here, is the approximation errorErr(w1, w2)
Err(w1, w2) = 𝔼Y[(Y − 2w1FY(Y) − 2w2(1 − FY(Y)))2
]
→ Approximate loss well, small bias in the model
Theoretical Property
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
Especially, if thenY ∼ Unif[a, b] Err(b/2,a/2) = 0
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Theoretical Property
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
In general,
① Theoretically, it’s inevitable…
② Empirically it works!
Err > 0
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Theoretical Property
There exists two distributions
that cannot distinguished by PX, PY, PX+,X−
Theoretical Property
PXY
X
Y
˜PXY
X
Y
1/6
1/8 5/24
1/4
1/8 5/24
1/6
1/6
1/6
1/6
1/6
1/12
Same , , butPX PY, PX+,X− 𝔼P[Y|X] ≠ 𝔼 ˜P[Y|X]
Empirical Result
• Learn a linear model in UCI datasets
• Uncoupled regression
• Use all features for , all targets for
• Note, no correspondence is given
• Generate 5000 pairs of
• Supervised regression
• Use entire coupled data
DX DY
DX+,X−
(X, Y)
Empirical Result
• MSE of linear models in UCI datasets
→ Can yield almost same MSE as supervised learning !
Conclusion So Far
• Uncoupled Regression From Pairwise Comparison
• Solve regression problem given
• Unlabeled data
• Set of target value
• Pairwise comparison data
• Introduced approach based on risk approximation
• Theoretical and empirical results are given
DX
DY
DX+,X−
Modeling CDF
from Pairwise Comparison Data
Theoretical Property (Recap)
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
Especially, if then
→ We can learn optimal
Y ∼ Unif[a, b] Err(b/2,a/2) = 0
Y
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Predicting Percentile
• Optimize Direct Marketing
• : Customer Feature, : Probability of Purchase
• Send discount tickets to 1% of potential customers
• CDF is more the target of interest than
• Predicting might not be a best idea…
• Due to class imbalance, all can be very small
X Y
FY(Y) Y
Y
Y
Predicting Percentile
• Sometimes percentile is the target of interest
• Learn that minimizes
• follows
→We can learn optimal from pairwise comparison
f(X)
R(f ) = 𝔼[(FY(Y) − f(X))2
]
FY(Y) Unif[0,1]
f
Motivating Example for Predicting Percentile
• Online Chess Rating
• : User attributes, : Abstract measure of “Skill”
• Skill is compared by game
• Pairwise comparison data given in nature
• Want to know the percentile in skill ranking
X Y
Simple Solution
• Problem (Recap)
• Given pairwise comparison data
• Predict conditional expectation of CDF
• Simple Solution
• Learn ranking model from
• Transform to
(X+
, X−
)
𝔼[FY(Y)|X]
r(X) (X+
, X−
)
r(X) 𝔼[FY(Y)|X]
Pairwise-Ranking based Approach
• Pairwise Learn to Rank
• Learn ranker which minimizes rank loss
• e.g. SVMRank, RankBoost
• Given test data and rank model,
r(X)
Xtest
𝔼[FY(Y)|X] ≃
Rank of Xtest in entire data
Number of entire data
Weakness in Pairwise-Ranking based Approach
• Original Goal is to minimize
,
• Rank model minimizes
Small does not necessary mean small
→We aim for directly minimizing
R(f ) = 𝔼X,Y[(f(X) − FY(Y))2
]
r(X)
Rr(r) R(f )
R(f )
Direct Minimization
Lemma 1 [Xu et al. 2019]
For any function ,h
𝔼X+[h(X+
)] = 2𝔼X,Y[FY(Y)h(X)]
𝔼X−[h(X−
)] = 2𝔼X,Y[(1 − FY(Y))h(X)]
From this lemma, we have
R(f ) = 𝔼X,Y[(f(X) − FY(Y))2
]
= 𝔼X[f2
(X)] −2𝔼X,Y[FY(Y)f(X)] +const .
= 𝔼X[f2
(X)] −𝔼X+[f(X+
)] +const .
R(f ) ≤ ̂R(f ) + Op
1
|DX |
+
1
|DX+,X− |
Empirical Approximation
• The original loss (without constant)
• The empirical loss
R(f )
R(f ) = 𝔼X[f2
(X)] − 𝔼X+[f(X+
)]
̂R(f )
̂R(f ) =
1
|DX | ∑
DX
f2
(xi) −
1
|DX+,X− | ∑
DX+,X−
f(x+
i )
Summary
• Summary
• We can learn only from
• Empirical loss to minimize is
Can we use this to original regression problem?
𝔼[FY(Y)|X] DX, DX+,X−
̂R(f ) =
1
|DX | ∑
DX
f2
(xi) −
1
|DX+,X− | ∑
DX+,X−
f(x+
i )
Target Transform Approach
Target Transformation
• From previous discussion,
• We can learn optimal model for
• We can learn CDF function .
• Target Transformation Approach [Xu et al. 2019]
1. Learn function minimizes
2. Output regression model as
FY(Y)
FY
̂F
RF(F) = 𝔼X,Y[(FY(Y) − F(X))2
]
̂f
̂f = F(−1)
Y
(F(X))
Target Transformation
• Target Transformation
• Step1: Estimate CDF
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f
Target Transformation
• Target Transformation
• Step1: Estimate CDF
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f
CDF is estimated viaFY
Target Transformation
• Target Transformation
• Step1: Estimate CDF
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f
Model is learned bŷF
̂F = arg min
F
1
|DX |
|DX|
∑
i=1
F(xi)2
−
1
|DX+,X− |
|DX+,X−|
∑
j=1
F(x+
j )
𝔼X[f2
(X)] 2𝔼XY[FY(Y)f(X)]
Target Transformation
• Target Transformation
• Step1: Estimate CDF
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f
Model is learned byf
̂f = F−1
Y ( ̂F(X))
Experiment on UCI
• RA: Risk Approximation
• TT: Target Transformation
• SVMRank: TT approach with is learned based on SVMRank̂F
Conclusion
• Uncoupled Regression From Pairwise Comparison
• Solve regression problem given
• Unlabeled data
• Set of target value
• Pairwise comparison data
• Approach based on risk approximation
• Theoretical and empirical results are given
• Approach based on target transformation
• (Theoretical) and empirical results are given
DX
DY
DX+,X−
Thank you!
• Follow me on Twitter! (@ly9988)

More Related Content

Similar to Uncoupled Regression from Pairwise Comparison Data

Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Gabriel Peyré
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
Dong Guo
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Gabriel Peyré
 
Fi review5
Fi review5Fi review5
Fi review5
Jae-kwang Kim
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Image denoising
Image denoisingImage denoising
Image denoising
Yap Wooi Hen
 
Limits and Continuity of Functions
Limits and Continuity of Functions Limits and Continuity of Functions
Limits and Continuity of Functions
OlooPundit
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
helalmohammad2
 
Image Processing 3
Image Processing 3Image Processing 3
Image Processing 3jainatin
 
IVR - Chapter 5 - Bayesian methods
IVR - Chapter 5 - Bayesian methodsIVR - Chapter 5 - Bayesian methods
IVR - Chapter 5 - Bayesian methods
Charles Deledalle
 
Open GL T0074 56 sm4
Open GL T0074 56 sm4Open GL T0074 56 sm4
Open GL T0074 56 sm4Roziq Bahtiar
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structure
VjekoslavKovac1
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selection
guasoni
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
HidenoriOgata
 
CS571: Gradient Descent
CS571: Gradient DescentCS571: Gradient Descent
CS571: Gradient Descent
Jinho Choi
 
Gradient Descent
Gradient DescentGradient Descent
Gradient Descent
Jinho Choi
 

Similar to Uncoupled Regression from Pairwise Comparison Data (20)

Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Fi review5
Fi review5Fi review5
Fi review5
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
sada_pres
sada_pressada_pres
sada_pres
 
Image denoising
Image denoisingImage denoising
Image denoising
 
Limits and Continuity of Functions
Limits and Continuity of Functions Limits and Continuity of Functions
Limits and Continuity of Functions
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
Image Processing 3
Image Processing 3Image Processing 3
Image Processing 3
 
IVR - Chapter 5 - Bayesian methods
IVR - Chapter 5 - Bayesian methodsIVR - Chapter 5 - Bayesian methods
IVR - Chapter 5 - Bayesian methods
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Open GL T0074 56 sm4
Open GL T0074 56 sm4Open GL T0074 56 sm4
Open GL T0074 56 sm4
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structure
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selection
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
CS571: Gradient Descent
CS571: Gradient DescentCS571: Gradient Descent
CS571: Gradient Descent
 
Gradient Descent
Gradient DescentGradient Descent
Gradient Descent
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 

Uncoupled Regression from Pairwise Comparison Data

  • 1. Uncoupled Regression from Comparison Data Liyuan Xu Gatsby Unit@UCL, Former AIP member (Twitter: @ly9988)
  • 2. Disclaimer This talk is mainly based on our paper in NeurIPS2019
  • 4. Regression Problem (x1, y1), (x2, y2), … (Coupled) Data ∼ PXY f(X) ≃ 𝔼[Y|X] Learn Correspondence in data is assumed
  • 5. Uncoupled Regression Problem Uncoupled Data ∼ PX x1, x2, x3, … ∼ PY y1, y2, y3, … f(X) ≃ 𝔼[Y|X] Learn Regression without data correspondence
  • 6. Uncoupled Regression Uncoupled regression is impossible itself. →What is a practically feasible assumption?
  • 7. Application of Uncoupled Regression • Merging two datasets [Carpentier+, 2016] • : income, housing priceX Y : Government Publish X Bank Publish Y How to merge two datasets collected independently?
  • 8. Application of Uncoupled Regression • Privacy Preserving Machine Learning [Xu et al. 2019] • Consider contains sensitive informationY (Xi, Yi) Security Incident
  • 9. Application of Uncoupled Regression • Privacy Preserving Machine Learning [Xu et al. 2019] • Consider contains sensitive informationY Xi Yi Anonymized Data
  • 10. Data Fusion / Matching Uncoupled Data w. Context ∼ PXZ (x1, z1), (x2, z2), … ∼ PYZ (y1, z′1), (y2, z′2), … f(X) ≃ 𝔼[Y|X] Learn Use contextual data to merge two distributions → Data Fusion / Matching Z
  • 11. Isometric Uncoupled Regression [Carpentier+, 2016] Uncoupled Data ∼ PX x1, x2, x3, … ∼ PY y1, y2, y3, … f(X) ≃ 𝔼[Y|X] Learn Assuming 𝔼[Y|X] : monotonic Monotonicity makes uncoupled regression feasible
  • 12. Isometric Uncoupled Regression [Carpentier+, 2016] • Advantage • Consistency is proved [Rigollet et al. 2018] → Optimal model can be learn as data increases • Limitation • Monotonicity assumption may be too strong • Is really income monotonic to housing price ? • Only applicable to the case • Need to know the noise distribution • Solve problem with with known X Y X ∈ ℝ Y = f*(x) + ε P(ε)
  • 13. High-level concept Message in [Carpentier+, 2016] Uncoupled Data + Order Info. → Regression Order info is provided by monotonic assumption Our Idea Order info is learned from pairwise comparison data Uncoupled Data + Order Info. → Regression
  • 14. Problem Setting • Pairwise Comparison Data • Originally considered in ranking context • Sample two data points • Obtain Pairwise Comparison Data as (X, Y), (X′, Y′) ∼ PX,Y (X+ , X− ) { X+ = X, X− = X′ (if Y > Y′) X+ = X′, X− = X (if Y ≤ Y′)
  • 15. Uncoupled Regression from Pairwise Comparison ∼ PX x1, x2, x3, … ∼ PY y1, y2, y3, … f(X) ≃ 𝔼[Y|X] Learn ∼ PX+,X− (x+ 1 , x− 1 ), (x+ 2 , x− 2 ), … Uncoupled Data Pairwise Comparison Data
  • 16. Uncoupled Regression from Pairwise Comparison Proposes two approaches: Risk Approximation & Target Transformation • Advantage • Put no assumption on • Need not to know noise distribution • Limitation • Not consistent • Deviation from optimal model is bounded • Empirically it works 𝔼[Y|X]
  • 18. Formal Problem Settings • Data Given: • Unlabeled Data: • Target Set: • Pairwise Comparison Data: • Goal: Find that satisfies DX = {x1, x2, …, xn} ∼ PX DY = {y1, y2, …, yn} ∼ PY DX+,X− = {(x+ 1 , x− 1 ), …, (x+ m, x− m)} ∼ PX+,X− f* f* = arg min f R(f ), R(f ) = 𝔼[(f(X) − Y)2 ]
  • 19. Risk Approximation Loss Decomposition R(f ) = 𝔼X,Y[(f(X) − Y)2 ] = 𝔼X[f2 (X)] − 2𝔼X,Y[Yf(X)] + const . Estimated from unlabeled data DX Approx. by linear combination of and 𝔼X,Y[Yf(X)] 𝔼X+[f(X+ )] 𝔼X−[f(X− )]
  • 20. Risk Approximation Lemma 1 [Xu et al. 2019] For any function ,f 𝔼X+[f(X+ )] = 2𝔼X,Y[FY(Y)f(X)] 𝔼X−[f(X− )] = 2𝔼X,Y[(1 − FY(Y))f(X)], where is CDF ofFY Y If we can learn such thatw1, w2 Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y)) then, 𝔼XY[Yf(X)] ≃ w1 𝔼X+[f(X+ )] + w2 𝔼X−[f(X− )]
  • 21. Risk Approximation • Risk Approximation • Step1: Estimate CDF • Step2: Learn weights for loss • Step3: Learn model ̂FY ̂w1, ̂w2 ̂f
  • 22. Risk Approximation • Risk Approximation • Step1: Estimate CDF • Step2: Learn weights for loss • Step3: Learn model ̂FY ̂w1, ̂w2 ̂f CDF is estimated viaFY
  • 23. Risk Approximation • Risk Approximation • Step1: Estimate CDF • Step2: Learn weights for loss • Step3: Learn model ̂FY ̂w1, ̂w2 ̂f Weight is learned bŷw1, ̂w2 ̂w1, ̂w2 = arg min |DY| ∑ i=1 (yi − 2w1 ̂FY(yi) − 2w2(1 − ̂FY(yi))) 2 Recall, we want Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))
  • 24. Risk Approximation • Risk Approximation • Step1: Estimate CDF • Step2: Learn weights for loss • Step3: Learn model ̂FY ̂w1, ̂w2 ̂f Model is learned byf ̂f = arg min f 1 |DX | |DX| ∑ i=1 f(xi)2 − 2 |DX+,X− | |DX+,X−| ∑ j=1 ̂w1f(x+ j ) + ̂w2 f(x− j ) 𝔼X[f2 (X)] 2𝔼XY[Yf(X)]
  • 25. Theoretical Property Theorem 2 [Xu et al. 2019] For learned , with some assumption,̂f R( ̂f ) ≤ R(f*) + Op ( 1 |DX |1/2 + 1 |DX−,X+ |1/2 ) + M Err( ̂w1, ̂w2) Here, is the approximation errorErr(w1, w2) Err(w1, w2) = 𝔼Y[(Y − 2w1FY(Y) − 2w2(1 − FY(Y)))2 ] → Approximate loss well, small bias in the model
  • 26. Theoretical Property Theorem 2 [Xu et al. 2019] For learned , with some assumption,̂f Especially, if thenY ∼ Unif[a, b] Err(b/2,a/2) = 0 R( ̂f ) ≤ R(f*) + Op ( 1 |DX |1/2 + 1 |DX−,X+ |1/2 ) + M Err( ̂w1, ̂w2)
  • 27. Theoretical Property Theorem 2 [Xu et al. 2019] For learned , with some assumption,̂f In general, ① Theoretically, it’s inevitable… ② Empirically it works! Err > 0 R( ̂f ) ≤ R(f*) + Op ( 1 |DX |1/2 + 1 |DX−,X+ |1/2 ) + M Err( ̂w1, ̂w2)
  • 28. Theoretical Property There exists two distributions that cannot distinguished by PX, PY, PX+,X−
  • 29. Theoretical Property PXY X Y ˜PXY X Y 1/6 1/8 5/24 1/4 1/8 5/24 1/6 1/6 1/6 1/6 1/6 1/12 Same , , butPX PY, PX+,X− 𝔼P[Y|X] ≠ 𝔼 ˜P[Y|X]
  • 30. Empirical Result • Learn a linear model in UCI datasets • Uncoupled regression • Use all features for , all targets for • Note, no correspondence is given • Generate 5000 pairs of • Supervised regression • Use entire coupled data DX DY DX+,X− (X, Y)
  • 31. Empirical Result • MSE of linear models in UCI datasets → Can yield almost same MSE as supervised learning !
  • 32. Conclusion So Far • Uncoupled Regression From Pairwise Comparison • Solve regression problem given • Unlabeled data • Set of target value • Pairwise comparison data • Introduced approach based on risk approximation • Theoretical and empirical results are given DX DY DX+,X−
  • 33. Modeling CDF from Pairwise Comparison Data
  • 34. Theoretical Property (Recap) Theorem 2 [Xu et al. 2019] For learned , with some assumption,̂f Especially, if then → We can learn optimal Y ∼ Unif[a, b] Err(b/2,a/2) = 0 Y R( ̂f ) ≤ R(f*) + Op ( 1 |DX |1/2 + 1 |DX−,X+ |1/2 ) + M Err( ̂w1, ̂w2)
  • 35. Predicting Percentile • Optimize Direct Marketing • : Customer Feature, : Probability of Purchase • Send discount tickets to 1% of potential customers • CDF is more the target of interest than • Predicting might not be a best idea… • Due to class imbalance, all can be very small X Y FY(Y) Y Y Y
  • 36. Predicting Percentile • Sometimes percentile is the target of interest • Learn that minimizes • follows →We can learn optimal from pairwise comparison f(X) R(f ) = 𝔼[(FY(Y) − f(X))2 ] FY(Y) Unif[0,1] f
  • 37. Motivating Example for Predicting Percentile • Online Chess Rating • : User attributes, : Abstract measure of “Skill” • Skill is compared by game • Pairwise comparison data given in nature • Want to know the percentile in skill ranking X Y
  • 38. Simple Solution • Problem (Recap) • Given pairwise comparison data • Predict conditional expectation of CDF • Simple Solution • Learn ranking model from • Transform to (X+ , X− ) 𝔼[FY(Y)|X] r(X) (X+ , X− ) r(X) 𝔼[FY(Y)|X]
  • 39. Pairwise-Ranking based Approach • Pairwise Learn to Rank • Learn ranker which minimizes rank loss • e.g. SVMRank, RankBoost • Given test data and rank model, r(X) Xtest 𝔼[FY(Y)|X] ≃ Rank of Xtest in entire data Number of entire data
  • 40. Weakness in Pairwise-Ranking based Approach • Original Goal is to minimize , • Rank model minimizes Small does not necessary mean small →We aim for directly minimizing R(f ) = 𝔼X,Y[(f(X) − FY(Y))2 ] r(X) Rr(r) R(f ) R(f )
  • 41. Direct Minimization Lemma 1 [Xu et al. 2019] For any function ,h 𝔼X+[h(X+ )] = 2𝔼X,Y[FY(Y)h(X)] 𝔼X−[h(X− )] = 2𝔼X,Y[(1 − FY(Y))h(X)] From this lemma, we have R(f ) = 𝔼X,Y[(f(X) − FY(Y))2 ] = 𝔼X[f2 (X)] −2𝔼X,Y[FY(Y)f(X)] +const . = 𝔼X[f2 (X)] −𝔼X+[f(X+ )] +const .
  • 42. R(f ) ≤ ̂R(f ) + Op 1 |DX | + 1 |DX+,X− | Empirical Approximation • The original loss (without constant) • The empirical loss R(f ) R(f ) = 𝔼X[f2 (X)] − 𝔼X+[f(X+ )] ̂R(f ) ̂R(f ) = 1 |DX | ∑ DX f2 (xi) − 1 |DX+,X− | ∑ DX+,X− f(x+ i )
  • 43. Summary • Summary • We can learn only from • Empirical loss to minimize is Can we use this to original regression problem? 𝔼[FY(Y)|X] DX, DX+,X− ̂R(f ) = 1 |DX | ∑ DX f2 (xi) − 1 |DX+,X− | ∑ DX+,X− f(x+ i )
  • 45. Target Transformation • From previous discussion, • We can learn optimal model for • We can learn CDF function . • Target Transformation Approach [Xu et al. 2019] 1. Learn function minimizes 2. Output regression model as FY(Y) FY ̂F RF(F) = 𝔼X,Y[(FY(Y) − F(X))2 ] ̂f ̂f = F(−1) Y (F(X))
  • 46. Target Transformation • Target Transformation • Step1: Estimate CDF • Step2: Learn CDF model • Step3: Learn regression model ̂FY ̂F ̂f
  • 47. Target Transformation • Target Transformation • Step1: Estimate CDF • Step2: Learn CDF model • Step3: Learn regression model ̂FY ̂F ̂f CDF is estimated viaFY
  • 48. Target Transformation • Target Transformation • Step1: Estimate CDF • Step2: Learn CDF model • Step3: Learn regression model ̂FY ̂F ̂f Model is learned bŷF ̂F = arg min F 1 |DX | |DX| ∑ i=1 F(xi)2 − 1 |DX+,X− | |DX+,X−| ∑ j=1 F(x+ j ) 𝔼X[f2 (X)] 2𝔼XY[FY(Y)f(X)]
  • 49. Target Transformation • Target Transformation • Step1: Estimate CDF • Step2: Learn CDF model • Step3: Learn regression model ̂FY ̂F ̂f Model is learned byf ̂f = F−1 Y ( ̂F(X))
  • 50. Experiment on UCI • RA: Risk Approximation • TT: Target Transformation • SVMRank: TT approach with is learned based on SVMRank̂F
  • 51. Conclusion • Uncoupled Regression From Pairwise Comparison • Solve regression problem given • Unlabeled data • Set of target value • Pairwise comparison data • Approach based on risk approximation • Theoretical and empirical results are given • Approach based on target transformation • (Theoretical) and empirical results are given DX DY DX+,X−
  • 52. Thank you! • Follow me on Twitter! (@ly9988)