Uncoupled Regression from Pairwise Comparison Data

Uncoupled Regression from
Comparison Data
Liyuan Xu
Gatsby Unit@UCL, Former AIP member
(Twitter: @ly9988)

Disclaimer
This talk is mainly based on our paper in NeurIPS2019

Regression Problem
(x1, y1), (x2, y2), …
(Coupled) Data
∼ PXY
f(X) ≃ 𝔼[Y|X]
Learn
Correspondence in data is assumed

Uncoupled Regression Problem
Uncoupled Data
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
Regression without data correspondence

Uncoupled Regression
Uncoupled regression is impossible itself.
→What is a practically feasible assumption?

Application of Uncoupled Regression
• Merging two datasets [Carpentier+, 2016]
• : income, housing priceX Y :
Government
Publish X
Bank
Publish Y
How to merge two datasets
collected independently?

• Privacy Preserving Machine Learning [Xu et al. 2019]
• Consider contains sensitive informationY
(Xi, Yi)
Security Incident

• Privacy Preserving Machine Learning [Xu et al. 2019]
• Consider contains sensitive informationY
Xi Yi
Anonymized Data

Data Fusion / Matching
Uncoupled Data w. Context
∼ PXZ
(x1, z1), (x2, z2), …
∼ PYZ
(y1, z′1), (y2, z′2), …
f(X) ≃ 𝔼[Y|X]
Learn
Use contextual data to merge two distributions
→ Data Fusion / Matching
Z

Isometric Uncoupled Regression [Carpentier+, 2016]
Uncoupled Data
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
Assuming
𝔼[Y|X] : monotonic
Monotonicity makes uncoupled regression feasible

Isometric Uncoupled Regression [Carpentier+, 2016]
• Advantage
• Consistency is proved [Rigollet et al. 2018]
→ Optimal model can be learn as data increases
• Limitation
• Monotonicity assumption may be too strong
• Is really income monotonic to housing price ?
• Only applicable to the case
• Need to know the noise distribution
• Solve problem with with known
X Y
X ∈ ℝ
Y = f*(x) + ε P(ε)

High-level concept
Message in [Carpentier+, 2016]
Uncoupled Data + Order Info. → Regression
Order info is provided by monotonic assumption
Our Idea
Order info is learned from pairwise comparison data
Uncoupled Data + Order Info. → Regression

Problem Setting
• Pairwise Comparison Data
• Originally considered in ranking context
• Sample two data points
• Obtain Pairwise Comparison Data as
(X, Y), (X′, Y′) ∼ PX,Y
(X+
, X−
)
{
X+
= X, X−
= X′ (if Y > Y′)
X+
= X′, X−
= X (if Y ≤ Y′)

Uncoupled Regression from Pairwise Comparison
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
∼ PX+,X−
(x+
1 , x−
1 ), (x+
2 , x−
2 ), …
Uncoupled Data Pairwise Comparison Data

Uncoupled Regression from Pairwise Comparison
Proposes two approaches:
Risk Approximation & Target Transformation
• Advantage
• Put no assumption on
• Need not to know noise distribution
• Limitation
• Not consistent
• Deviation from optimal model is bounded
• Empirically it works
𝔼[Y|X]

Formal Problem Settings
• Data Given:
• Unlabeled Data:
• Target Set:
• Pairwise Comparison Data:
• Goal: Find that satisﬁes
DX = {x1, x2, …, xn} ∼ PX
DY = {y1, y2, …, yn} ∼ PY
DX+,X− = {(x+
1 , x−
1 ), …, (x+
m, x−
m)} ∼ PX+,X−
f*
f* = arg min
f
R(f ), R(f ) = 𝔼[(f(X) − Y)2
]

Risk Approximation
Loss Decomposition
R(f ) = 𝔼X,Y[(f(X) − Y)2
]
= 𝔼X[f2
(X)] − 2𝔼X,Y[Yf(X)] + const .
Estimated from unlabeled data DX
Approx. by
linear combination of and
𝔼X,Y[Yf(X)]
𝔼X+[f(X+
)] 𝔼X−[f(X−
)]

Risk Approximation
Lemma 1 [Xu et al. 2019]
For any function ,f
𝔼X+[f(X+
)] = 2𝔼X,Y[FY(Y)f(X)]
𝔼X−[f(X−
)] = 2𝔼X,Y[(1 − FY(Y))f(X)],
where is CDF ofFY Y
If we can learn such thatw1, w2
Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))
then,
𝔼XY[Yf(X)] ≃ w1 𝔼X+[f(X+
)] + w2 𝔼X−[f(X−
)]

Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f

Risk Approximation
̂FY
̂w1, ̂w2
̂f
CDF is estimated viaFY

Risk Approximation
̂FY
̂w1, ̂w2
̂f
Weight is learned bŷw1, ̂w2
̂w1, ̂w2 = arg min
|DY|
∑
i=1
(yi − 2w1
̂FY(yi) − 2w2(1 − ̂FY(yi)))
2
Recall, we want Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))

Risk Approximation
̂FY
̂w1, ̂w2
̂f
Model is learned byf
̂f = arg min
f
1
|DX |
|DX|
∑
i=1
f(xi)2
−
2
|DX+,X− |
|DX+,X−|
∑
j=1
̂w1f(x+
j ) + ̂w2 f(x−
j )
𝔼X[f2
(X)] 2𝔼XY[Yf(X)]

Theoretical Property
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Here, is the approximation errorErr(w1, w2)
Err(w1, w2) = 𝔼Y[(Y − 2w1FY(Y) − 2w2(1 − FY(Y)))2
]
→ Approximate loss well, small bias in the model

Especially, if thenY ∼ Unif[a, b] Err(b/2,a/2) = 0
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)

In general,
① Theoretically, it’s inevitable…
② Empirically it works!
Err > 0
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)

There exists two distributions
that cannot distinguished by PX, PY, PX+,X−

PXY
X
Y
˜PXY
X
Y
1/6
1/8 5/24
1/4
1/8 5/24
1/6
1/6
1/6
1/6
1/6
1/12
Same , , butPX PY, PX+,X− 𝔼P[Y|X] ≠ 𝔼 ˜P[Y|X]

Empirical Result
• Learn a linear model in UCI datasets
• Uncoupled regression
• Use all features for , all targets for
• Note, no correspondence is given
• Generate 5000 pairs of
• Supervised regression
• Use entire coupled data
DX DY
DX+,X−
(X, Y)

Empirical Result
• MSE of linear models in UCI datasets
→ Can yield almost same MSE as supervised learning !

Conclusion So Far
• Uncoupled Regression From Pairwise Comparison
• Solve regression problem given
• Unlabeled data
• Set of target value
• Pairwise comparison data
• Introduced approach based on risk approximation
• Theoretical and empirical results are given
DX
DY
DX+,X−

Modeling CDF
from Pairwise Comparison Data

Theoretical Property (Recap)
Especially, if then
→ We can learn optimal
Y ∼ Unif[a, b] Err(b/2,a/2) = 0
Y
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)

Predicting Percentile
• Optimize Direct Marketing
• : Customer Feature, : Probability of Purchase
• Send discount tickets to 1% of potential customers
• CDF is more the target of interest than
• Predicting might not be a best idea…
• Due to class imbalance, all can be very small
X Y
FY(Y) Y
Y
Y

Predicting Percentile
• Sometimes percentile is the target of interest
• Learn that minimizes
• follows
→We can learn optimal from pairwise comparison
f(X)
R(f ) = 𝔼[(FY(Y) − f(X))2
]
FY(Y) Unif[0,1]
f

Motivating Example for Predicting Percentile
• Online Chess Rating
• : User attributes, : Abstract measure of “Skill”
• Skill is compared by game
• Pairwise comparison data given in nature
• Want to know the percentile in skill ranking
X Y

Simple Solution
• Problem (Recap)
• Given pairwise comparison data
• Predict conditional expectation of CDF
• Simple Solution
• Learn ranking model from
• Transform to
(X+
, X−
)
𝔼[FY(Y)|X]
r(X) (X+
, X−
)
r(X) 𝔼[FY(Y)|X]

Pairwise-Ranking based Approach
• Pairwise Learn to Rank
• Learn ranker which minimizes rank loss
• e.g. SVMRank, RankBoost
• Given test data and rank model,
r(X)
Xtest
𝔼[FY(Y)|X] ≃
Rank of Xtest in entire data
Number of entire data

Weakness in Pairwise-Ranking based Approach
• Original Goal is to minimize
,
• Rank model minimizes
Small does not necessary mean small
→We aim for directly minimizing
R(f ) = 𝔼X,Y[(f(X) − FY(Y))2
]
r(X)
Rr(r) R(f )
R(f )

Direct Minimization
Lemma 1 [Xu et al. 2019]
For any function ,h
𝔼X+[h(X+
)] = 2𝔼X,Y[FY(Y)h(X)]
𝔼X−[h(X−
)] = 2𝔼X,Y[(1 − FY(Y))h(X)]
From this lemma, we have
R(f ) = 𝔼X,Y[(f(X) − FY(Y))2
]
= 𝔼X[f2
(X)] −2𝔼X,Y[FY(Y)f(X)] +const .
= 𝔼X[f2
(X)] −𝔼X+[f(X+
)] +const .

R(f ) ≤ ̂R(f ) + Op
1
|DX |
+
1
|DX+,X− |
Empirical Approximation
• The original loss (without constant)
• The empirical loss
R(f )
R(f ) = 𝔼X[f2
(X)] − 𝔼X+[f(X+
)]
̂R(f )
̂R(f ) =
1
|DX | ∑
DX
f2
(xi) −
1
|DX+,X− | ∑
DX+,X−
f(x+
i )

Target Transformation
• From previous discussion,
• We can learn optimal model for
• We can learn CDF function .
• Target Transformation Approach [Xu et al. 2019]
1. Learn function minimizes
2. Output regression model as
FY(Y)
FY
̂F
RF(F) = 𝔼X,Y[(FY(Y) − F(X))2
]
̂f
̂f = F(−1)
Y
(F(X))

• Target Transformation
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f

̂FY
̂F
̂f
CDF is estimated viaFY

̂FY
̂F
̂f
Model is learned bŷF
̂F = arg min
F
1
|DX |
|DX|
∑
i=1
F(xi)2
−
1
|DX+,X− |
|DX+,X−|
∑
j=1
F(x+
j )
𝔼X[f2
(X)] 2𝔼XY[FY(Y)f(X)]

̂FY
̂F
̂f
Model is learned byf
̂f = F−1
Y ( ̂F(X))

Experiment on UCI
• RA: Risk Approximation
• TT: Target Transformation
• SVMRank: TT approach with is learned based on SVMRank̂F

Conclusion
• Uncoupled Regression From Pairwise Comparison
• Solve regression problem given
• Unlabeled data
• Set of target value
• Pairwise comparison data
• Approach based on risk approximation
• Theoretical and empirical results are given
• Approach based on target transformation
• (Theoretical) and empirical results are given
DX
DY
DX+,X−

Thank you!
• Follow me on Twitter! (@ly9988)

Uncoupled Regression from Pairwise Comparison Data

Recommended

Recommended

More Related Content

Similar to Uncoupled Regression from Pairwise Comparison Data

Similar to Uncoupled Regression from Pairwise Comparison Data (20)

Recently uploaded

Recently uploaded (20)

Uncoupled Regression from Pairwise Comparison Data