Learning bounds for risk-sensitive learning

Jaeho Lee Sejun Park Jinwoo Shin

Korea Advanced Institute of Science and Technology (KAIST)
†
Learning bounds for Risk-sensitive learning
… or, “Robust and Fair ML with Vapnik & Chervonenkis”
Contact: jaeho-lee@kaist.ac.kr
Code: https://github.com/jaeho-lee/oce

Motivation: Robust and fair learning
Truth. Empirical risk minimization (ERM) is a theoretical foundation for ML.
̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
n
∑
i=1
1
n
⋅ f(Zi)

Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML.
n
∑
i=1
1
n
⋅ f(Zi)
Also Truth. .Modern-day ML is more than just ERM. 
-We weigh samples diﬀerently, based on their loss values!
̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
n
∑
i=1
wi ⋅ f(Zi)
Depends on , relative tof(Zi) f(Z1), f(Z2), ⋯, f(Zn)

n
∑
i=1
1
n
⋅ f(Zi)
Examples. .Robust learning with outliers / noisy labels (high-loss samples are ignored) 
Curriculum learning (low-loss samples are prioritized) 
Fair ML, with individual fairness criteria (low-loss samples are ignored)
n
∑
i=1
wi ⋅ f(Zi)
[1] e.g., Han et al., “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” NeurIPS 2018.

[2] e.g., Pawan Kumar et al., “Self-paced learning for latent variable models,” NeurIPS 2010. 
[3] e.g., Williamson et al., “Fairness risk measures,” ICML 2019.
[1]
[2]
[3]

n
∑
i=1
1
n
⋅ f(Zi)
n
∑
i=1
wi ⋅ f(Zi)
Examples. .Robust learning with outliers / noisy labels (high-loss samples are ignored) 
Curriculum learning (low-loss samples are prioritized) 
Fair ML, with individual fairness criteria (low-loss samples are ignored)
Question. Can we give convergence guarantees for algorithms with loss-dependent weights?
Challenge. What theoretical framework should we use?

Framework: Optimized Certainty Equivalents (OCE)
History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion.

- extends the utility-theoretic perspective of von Neumann and Morgenstern.
Utility curve 
(diminishing marginal utility)
Income
(Objective)
Utility 
(subjective)
Δ1
Δ2
Δ3


Deﬁnition. Capture the risk-averse behavior using a convex disutility function .ϕ
i.e., negative utility
𝗈𝖼𝖾(f, P) ≜ inf
λ∈ℝ
{λ + EP[ϕ(f(Z) − λ)]}
EP[ϕ(f(Z) − λ)]
λ Certain present loss
Uncertain future disutility


ML view. .We are penalizing the average loss + deviation!
𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf
λ∈ℝ
{EP[φ(f(Z) − λ)]}
… for some convex .φ(t) = ϕ(t) − t
λ* f(Z𝗁𝗂𝗀𝗁−𝗅𝗈𝗌𝗌)f(Z𝗅𝗈𝗐−𝗅𝗈𝗌𝗌)
“deviation penalty” from the 
optimized anchor λ*


λ∈ℝ
{EP[φ(f(Z) − λ)]}
Examples. This framework covers a wide range of “risk-averse” measures of loss.
- Average + variance penalty

- Conditional value-at-risk .(i.e., ignore low-loss samples)

- Entropic risk measure -(i.e., exponentially tilted loss).
Note: OCE is complementary to rank-based approaches 
(come to our poster session for details!)
[1] e.g., Maurer and Pontil, “Empirical Bernstein bounds and sample variance penalization,” COLT 2009.

[2] e.g., Curi et al., “Adaptive sampling for stochastic risk-averse learning,” NeurIPS 2020. 
[3] e.g., Li et al., “Tilted empirical risk minimization,” arXiv 2020.
[1]
[2]
[3]


λ∈ℝ
{EP[φ(f(Z) − λ)]}
Examples. This framework covers a wide range of “risk-averse” measures of loss.
- Average + variance penalty

- Conditional value-at-risk .(i.e., ignore low-loss samples)

- Entropic risk measure -(i.e., exponentially tilted loss).
Inverted OCE. A new notion to address “risk-seeking” algorithms (e.g., ignore high-loss samples)
𝗈𝖼𝖾(f, P) ≜ EP[f(Z)] − inf
λ∈ℝ
{EP[φ(λ − f(Z))]}

Results: Two learning bounds.
What we do. We analyze the empirical OCE minimization procedure:
Just as Vapnik&Chervonenkis studies “empirical risk minimization.” 
we also give inverted OCE version.
̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
𝗈𝖼𝖾(f, Pn)

In a nutshell. We give learning bounds of two diﬀerent type.
𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf
f∈ℱ
𝗈𝖼𝖾(f, P) ≈ 𝒪
(
𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ)
n )
EP[ ̂f 𝖾𝗈𝗆(Z)] − inf
f∈ℱ
EP[f(Z)] ≈ 𝒪
(
𝖼𝗈𝗆𝗉(ℱ)
n )
Theorem 6. Excess expected loss bound
Theorem 3. Excess OCE bound
𝗈𝖼𝖾(f, Pn)

Also… We also discover the relationship to sample variance penalization (SVP) procedure, 
and ﬁnd that SVP is a nice baseline strategy for batch-based OCE minimization.
𝗈𝖼𝖾(f, Pn)
f∈ℱ
(
n )
f∈ℱ
EP[f(Z)] ≈ 𝒪
(
n )

Also… We also discover the relationship to sample variance penalization (SVP) procedure, 
and ﬁnd that SVP is a nice baseline strategy for batch-based OCE minimization.
𝗈𝖼𝖾(f, Pn)
f∈ℱ
(
n )
f∈ℱ
EP[f(Z)] ≈ 𝒪
(
n )
TL;DR. . - We give OCE-based theoretical framework to address robust/fair ML. 
-- We give excess risk bounds for empirical OCE minimizers.
- Further implications of our theoretical results… 
- Proof ideas… 
- Experiment details… 
- Comparisons with alternative frameworks…
Come to our zoom session for interesting details, including…

Learning bounds for risk-sensitive learning

Recommended

Recommended

More Related Content

Similar to Learning bounds for risk-sensitive learning

Similar to Learning bounds for risk-sensitive learning (20)

More from ALINLAB

More from ALINLAB (7)

Recently uploaded

Recently uploaded (20)

Learning bounds for risk-sensitive learning