# Learning bounds for risk-sensitive learning

A
1 of 14

## Recommended

148 views49 slides
2.5K views48 slides
Thesis DefenseAaron Lu
333 views38 slides

## Similar to Learning bounds for risk-sensitive learning

Lesson 26Avijit Kumar
187 views8 slides

### Similar to Learning bounds for risk-sensitive learning(20)

Raman Kannan114 views
Lesson 26
Avijit Kumar187 views
AI Lesson 26
Assistant Professor394 views
STAT: Random experiments(2)
Tuenti SiIx3.5K views
Exon Junction Complex
Kara Richards3 views
EWMA VaR Models
DanielMiraglia4.3K views
Basic Inference Analysis
Ameen AboDabash138 views
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute636 views
Chapter06
rwmiller927 views
chap4_Parametric_Methods.ppt
ShayanChowdary14 views
Artificial intelligence
keerthikaA843 views
Artificial intelligence.pptx
keerthikaA823 views
Artificial intelligence
keerthikaA842 views
LECTURE8.PPT
butest452 views

## More from ALINLAB

### More from ALINLAB(7)

SNMPxAmatullahbutt
10 views12 slides

### Learning bounds for risk-sensitive learning

• 1. Jaeho Lee Sejun Park Jinwoo Shin Korea Advanced Institute of Science and Technology (KAIST) † Learning bounds for Risk-sensitive learning … or, “Robust and Fair ML with Vapnik & Chervonenkis” Contact: jaeho-lee@kaist.ac.kr Code: https://github.com/jaeho-lee/oce
• 2. Motivation: Robust and fair learning Truth. Empirical risk minimization (ERM) is a theoretical foundation for ML. ̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 1 n ⋅ f(Zi)
• 3. Motivation: Robust and fair learning Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML. ̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 1 n ⋅ f(Zi) Also Truth. .Modern-day ML is more than just ERM.  -We weigh samples diﬀerently, based on their loss values! ̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 wi ⋅ f(Zi) Depends on , relative tof(Zi) f(Z1), f(Z2), ⋯, f(Zn)
• 4. Motivation: Robust and fair learning Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML. ̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 1 n ⋅ f(Zi) Examples. .Robust learning with outliers / noisy labels (high-loss samples are ignored)  Curriculum learning (low-loss samples are prioritized)  Fair ML, with individual fairness criteria (low-loss samples are ignored) Also Truth. .Modern-day ML is more than just ERM.  -We weigh samples diﬀerently, based on their loss values! ̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 wi ⋅ f(Zi) [1] e.g., Han et al., “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” NeurIPS 2018. [2] e.g., Pawan Kumar et al., “Self-paced learning for latent variable models,” NeurIPS 2010.  [3] e.g., Williamson et al., “Fairness risk measures,” ICML 2019. [1] [2] [3]
• 5. Motivation: Robust and fair learning Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML. ̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 1 n ⋅ f(Zi) Also Truth. .Modern-day ML is more than just ERM.  -We weigh samples diﬀerently, based on their loss values! ̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 wi ⋅ f(Zi) Examples. .Robust learning with outliers / noisy labels (high-loss samples are ignored)  Curriculum learning (low-loss samples are prioritized)  Fair ML, with individual fairness criteria (low-loss samples are ignored) Question. Can we give convergence guarantees for algorithms with loss-dependent weights? Challenge. What theoretical framework should we use?
• 6. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Utility curve  (diminishing marginal utility) Income (Objective) Utility  (subjective) Δ1 Δ2 Δ3
• 7. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Deﬁnition. Capture the risk-averse behavior using a convex disutility function .ϕ i.e., negative utility 𝗈𝖼𝖾(f, P) ≜ inf λ∈ℝ {λ + EP[ϕ(f(Z) − λ)]} EP[ϕ(f(Z) − λ)] λ Certain present loss Uncertain future disutility
• 8. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Deﬁnition. Capture the risk-averse behavior using a convex disutility function .ϕ i.e., negative utility ML view. .We are penalizing the average loss + deviation! 𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf λ∈ℝ {EP[φ(f(Z) − λ)]} … for some convex .φ(t) = ϕ(t) − t λ* f(Z𝗁𝗂𝗀𝗁−𝗅𝗈𝗌𝗌)f(Z𝗅𝗈𝗐−𝗅𝗈𝗌𝗌) “deviation penalty” from the  optimized anchor λ*
• 9. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Deﬁnition. Capture the risk-averse behavior using a convex disutility function .ϕ i.e., negative utility ML view. .We are penalizing the average loss + deviation! 𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf λ∈ℝ {EP[φ(f(Z) − λ)]} Examples. This framework covers a wide range of “risk-averse” measures of loss. - Average + variance penalty - Conditional value-at-risk .(i.e., ignore low-loss samples) - Entropic risk measure -(i.e., exponentially tilted loss). Note: OCE is complementary to rank-based approaches  (come to our poster session for details!) [1] e.g., Maurer and Pontil, “Empirical Bernstein bounds and sample variance penalization,” COLT 2009. [2] e.g., Curi et al., “Adaptive sampling for stochastic risk-averse learning,” NeurIPS 2020.  [3] e.g., Li et al., “Tilted empirical risk minimization,” arXiv 2020. [1] [2] [3]
• 10. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Deﬁnition. Capture the risk-averse behavior using a convex disutility function .ϕ i.e., negative utility ML view. .We are penalizing the average loss + deviation! 𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf λ∈ℝ {EP[φ(f(Z) − λ)]} Examples. This framework covers a wide range of “risk-averse” measures of loss. - Average + variance penalty - Conditional value-at-risk .(i.e., ignore low-loss samples) - Entropic risk measure -(i.e., exponentially tilted loss). Inverted OCE. A new notion to address “risk-seeking” algorithms (e.g., ignore high-loss samples) 𝗈𝖼𝖾(f, P) ≜ EP[f(Z)] − inf λ∈ℝ {EP[φ(λ − f(Z))]}
• 11. Results: Two learning bounds. What we do. We analyze the empirical OCE minimization procedure: Just as Vapnik&Chervonenkis studies “empirical risk minimization.”  we also give inverted OCE version. ̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ 𝗈𝖼𝖾(f, Pn)
• 12. Results: Two learning bounds. In a nutshell. We give learning bounds of two diﬀerent type. What we do. We analyze the empirical OCE minimization procedure: Just as Vapnik&Chervonenkis studies “empirical risk minimization.”  we also give inverted OCE version. 𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf f∈ℱ 𝗈𝖼𝖾(f, P) ≈ 𝒪 ( 𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ) n ) EP[ ̂f 𝖾𝗈𝗆(Z)] − inf f∈ℱ EP[f(Z)] ≈ 𝒪 ( 𝖼𝗈𝗆𝗉(ℱ) n ) Theorem 6. Excess expected loss bound Theorem 3. Excess OCE bound (come to our poster session for details!) ̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ 𝗈𝖼𝖾(f, Pn)
• 13. Results: Two learning bounds. In a nutshell. We give learning bounds of two diﬀerent type. What we do. We analyze the empirical OCE minimization procedure: Just as Vapnik&Chervonenkis studies “empirical risk minimization.”  we also give inverted OCE version. Theorem 6. Excess expected loss bound Theorem 3. Excess OCE bound Also… We also discover the relationship to sample variance penalization (SVP) procedure,  and ﬁnd that SVP is a nice baseline strategy for batch-based OCE minimization. (come to our poster session for details!) ̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ 𝗈𝖼𝖾(f, Pn) 𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf f∈ℱ 𝗈𝖼𝖾(f, P) ≈ 𝒪 ( 𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ) n ) EP[ ̂f 𝖾𝗈𝗆(Z)] − inf f∈ℱ EP[f(Z)] ≈ 𝒪 ( 𝖼𝗈𝗆𝗉(ℱ) n )
• 14. Results: Two learning bounds. In a nutshell. We give learning bounds of two diﬀerent type. What we do. We analyze the empirical OCE minimization procedure: Just as Vapnik&Chervonenkis studies “empirical risk minimization.”  we also give inverted OCE version. Theorem 6. Excess expected loss bound Theorem 3. Excess OCE bound Also… We also discover the relationship to sample variance penalization (SVP) procedure,  and ﬁnd that SVP is a nice baseline strategy for batch-based OCE minimization. (come to our poster session for details!) ̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ 𝗈𝖼𝖾(f, Pn) 𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf f∈ℱ 𝗈𝖼𝖾(f, P) ≈ 𝒪 ( 𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ) n ) EP[ ̂f 𝖾𝗈𝗆(Z)] − inf f∈ℱ EP[f(Z)] ≈ 𝒪 ( 𝖼𝗈𝗆𝗉(ℱ) n ) TL;DR. . - We give OCE-based theoretical framework to address robust/fair ML.  -- We give excess risk bounds for empirical OCE minimizers. - Further implications of our theoretical results…  - Proof ideas…  - Experiment details…  - Comparisons with alternative frameworks… Come to our zoom session for interesting details, including…
Current LanguageEnglish
Español
Portugues
Français
Deutsche
© 2023 SlideShare from Scribd