Successfully reported this slideshow.
Upcoming SlideShare
×

# Introduction of “Fairness in Learning: Classic and Contextual Bandits”

1,805 views

Published on

This material consists of an introduction of a paper titled “Fairness in Learning: Classic and Contextual Bandits” from NIPS2016. This is presented at https://connpass.com/event/47580/.

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Introduction of “Fairness in Learning: Classic and Contextual Bandits”

1. 1. Introduction of “Fairness in Learning: Classic and Contextual Bandits” authorized by Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth NIPS2016-Yomi January 19, 2017 Presenter: Kazuto Fukuchi
2. 2. Fairness in Machine Learning Consequential decisions using machine learning may lead unfair treatment E.g., Google’s ad suggestion system [Sweeney 13] Fairness in contextual bandit problem African descent names European descent names Arrested? Located Negative ad. Neutral ad.
3. 3. Individual fairness 𝐾 persons • Choose one person for conducting an action • E.g., lend loan, hire, admission, etc. When we can preferentially choose one person? Only if the person has the largest ability There is no other reason for preferential choice Payback 90% Payback 60% >
4. 4. Contextual Bandit Problem Each round 𝑡 1. Obtain a context 𝑥𝑗 𝑡 for each arm 𝑗 2. Choose one arm 𝑖 𝑡 ∈ [𝐾] 3. Observe reward 𝑟𝑖 𝑡 𝑡 s.t. 𝔼 𝑟𝑗 𝑡 = 𝑓𝑗 𝑥𝑗 𝑡 and 𝑟𝑗 𝑡 ∈ [0,1] a.s. 𝐾-arms 𝑓1 𝑓2 𝑓3 𝑓4 𝑓5 Unknown to the learner Goal: Maximize the expected cumulative reward 𝔼 𝑡 𝑟𝑖 𝑡 𝑡 = 𝔼 𝑡 𝑓𝑖 𝑡 𝑥𝑖 𝑡 𝑡
5. 5. Example: Linear Contextual Bandit Define 𝐶 = 𝑓𝜃 ∶ 𝑓𝜃 𝑥 = 𝜃, 𝑥 , 𝜃 ∈ ℝ 𝑑 , 𝜃 ≤ 1 𝒳 = 𝑥 ∈ ℝ 𝑑 ∶ 𝑥 ≤ 1 • Suppose 𝑓𝑗 = 𝑓𝜃 𝑗 ∈ 𝐶, 𝑥𝑗 𝑡 ∈ 𝒳 E.g., Online recommendation • 𝜃𝑗: Feature of a product 𝑗 • 𝑥𝑗 𝑡 : Feature of a user 𝑡 regarding the product 𝑗 • Score of a user 𝑡 for a product 𝑗 is an inner product 𝑥𝑗 𝑡 , 𝜃𝑗
6. 6. Example: Classic Bandit • Expected reward is 𝔼 𝑟𝑗 𝑡 = 𝜇 𝑗 • Set 𝑓𝑗 𝑥𝑗 𝑡 = 𝜇 𝑗 for any 𝑥𝑗 𝑡 • Then, the contextual bandit becomes to the classic bandit 𝜇1 𝜇2 𝜇3 𝜇4 𝜇5
7. 7. Regret • History ℎ 𝑡: a record of 𝑡 − 1 experiences • contexts, arm chosen, and reward observed • A policy 𝜋: mapping from 𝑥 𝑡 and ℎ 𝑡 to a distribution on arms [𝐾] • Probability of choosing arm 𝑗 with ℎ 𝑡 at round 𝑡 𝜋𝑗|ℎ 𝑡 𝑡 Regret: Dropped reward compared to the optimal policy Regret 𝑥1 , … , 𝑥 𝑇 = 𝑡 max 𝑗 𝑓𝑗 𝑥𝑗 𝑡 − 𝔼𝑖 𝑡∼𝜋 𝑡 𝑡 𝑓𝑖 𝑡 𝑥𝑖 𝑡 𝑡 Regret bound 𝑅(𝑇) if max 𝑥1,…,𝑥 𝑇 Regret 𝑥1 , … , 𝑥 𝑇 ≤ 𝑅(𝑇)
8. 8. Fairness Constraint It is unfair to preferentially choose one individual without an acceptable reason A policy 𝜋 is 𝜹-fair if with probability 1 − 𝛿 𝜋𝑗|ℎ 𝑡 > 𝜋𝑗′|ℎ 𝑡 only if 𝑓𝑗 𝑥𝑗 𝑡 > 𝑓𝑗′ 𝑥𝑗′ 𝑡 . Quality of the chosen individual is larger than others. Probability of choosing arm 𝑗 at round 𝑡 𝑓𝑗(𝑥𝑗 𝑡 ) > 𝑓𝑗′(𝑥𝑗′ 𝑡 )
9. 9. Institution of Fairness Constraint • Optimal policy is fair • But we can’t get the optimal policy due to unknown 𝑓1, … , 𝑓𝐾 > Can’t distinguish which arm has high expected reward Expected reward is lower than the left group with h.p. Fairness constraint enforces to choose a arm from the left group with uniform distribution
10. 10. Fairness in Classic Bandit • Consider confidence bounds of the expected rewards • Choose uniformly from the chained group expected rewards Arm 1 Arm 2 Arm 3 Arm 4 Arm 5 Chained Expected reward is lower than that of arms in the chained group
11. 11. Fair Algorithm for Classic Bandit
12. 12. Regret Upper Bound If 𝛿 < 1 𝑇 , then FairBandits has regret 𝑅 𝑇 = 𝑂 𝑘3 𝑇 ln 𝑇𝑘 𝛿 • 𝑇 = Ω 𝑘3 rounds require to obtain non-trivial regret, i.e., 𝑅 𝑇 𝑇 ≪ 1 • Non-fair case: 𝑂 𝑘𝑇 • 𝑘 becomes 𝑘3 by fairness constraint • Dependence on 𝑇 is optimal
13. 13. Regret Lower Bound Any fair algorithm experiences constant per-round regret for at least 𝑇 = Ω 𝑘3 ln 1 𝛿 • constant per-round regret = non-trivial regret • To achieve non-trivial regret, we need at least 𝑘3 rounds • Thus, Ω 𝑘3 is necessary and sufficient
14. 14. Fairness in Contextual Bandit KWIK learnable = Fair bandit learnable KWIK (Know What It Know) learning • Online regression • Learner outputs either prediction 𝑦 𝑡 ∈ [0,1] or 𝑦 𝑡 =⊥ • ⊥ denotes “I Don’t Know” • Only when 𝑦 𝑡 =⊥, the learner observes feedback 𝑦 𝑡 s.t. 𝔼 𝑦 𝑡 = 𝑓 𝑥 𝑡 𝑥 𝑡 Feature Learner 𝑦 𝑡 ∈ [0,1] “I Don’t Know” Accurately predictable
15. 15. KWIK learnable (𝜖, 𝛿)-KWIK learnable on a class 𝑓 ∈ 𝐶 with 𝑚 𝜖, 𝛿 if 1. 𝑦 𝑡 ∈ ⊥ ∪ [𝑓 𝑥 𝑡 − 𝜖, 𝑓 𝑥 𝑡 + 𝜖] for all 𝑡 w.p. 1 − 𝛿 2. 𝑡=1 ∞ 𝕀 𝑦 𝑡 =⊥ ≤ 𝑚 𝜖, 𝛿 Institutions • Prediction is accurate if 𝑦 𝑡 ≠⊥ • With small number of answering ⊥ • number of answering ⊥ = 𝑚 𝜖, 𝛿
16. 16. KWIK Learnability Implies Fair Bandit Learnability Suppose 𝐶 is (𝜖, 𝛿)-KWIK learnable with 𝑚 𝜖, 𝛿 Then, there is 𝛿-fair algorithm for 𝑓𝑗 ∈ 𝐶 s.t. 𝑅 𝑇 = 𝑂 max 𝑘2 𝑚 𝜖∗, min 𝛿, 1 𝑇 𝑇2 𝑘 , 𝑘3 ln 𝑘 𝛿 For 𝛿 ≤ 1 𝑇 where 𝜖∗ = arg min 𝜖 max 𝜖𝑇, 𝑘𝑚 𝜖, min 𝛿, 1 𝑇 𝑇2 𝑘
17. 17. Linear Contextual Bandit Case • Let 𝐶 = 𝑓𝜃 ∶ 𝑓𝜃 𝑥 = 𝜃, 𝑥 , 𝜃 ∈ ℝ 𝑑, 𝜃 ≤ 1 𝒳 = 𝑥 ∈ ℝ 𝑑 ∶ 𝑥 ≤ 1 • Then, 𝑅 𝑇 = 𝑂 max 𝑇 4 5 𝑘 6 5 𝑑 3 5, 𝑘3 ln 𝑘 𝛿
18. 18. KWIK to Fair
19. 19. Institution of KWIKToFair • Predict the expected rewards using KWIK algorithm for each arm • If the outputs of KWIK algorithm is not ⊥ • Same strategy of classic bandit is applicable expected rewards 𝑓𝑗 𝑥𝑗 𝑡 Arm 1 Arm 2 Arm 3 Arm 4 Arm 5 2𝜖∗
20. 20. Fair Bandit Learnability Implies KWIK Learnability Suppose • There is 𝛿-fair algorithm for 𝑓𝑗 ∈ 𝐶 with regret 𝑅 𝑇, 𝛿 • There exists 𝑓 ∈ 𝐶, 𝑥 ℓ ∈ 𝒳 s.t. 𝑓 𝑥 ℓ = ℓ𝜖 for ℓ = 1, … , 1 𝜖 Then, there is (𝜖, 𝛿)-KWIK learnable algorithm for 𝐶 with 𝑚 𝜖, 𝛿 is the solution of 𝑚 𝜖, 𝛿 𝜖 4 = 𝑅 𝑚 𝜖, 𝛿 , 𝜖𝛿 2𝑇
21. 21. An Exponential Separation Between Fair and Unfair Learning • Boolean conjunctions: Let 𝑥 ∈ 0,1 𝑑 𝐶 = 𝑓|𝑓 𝑥 = 𝑥𝑖1 ∧ ⋯ ∧ 𝑥𝑖 𝑘 , 0 ≤ 𝑘 ≤ 𝑑, 𝑖1, … , 𝑖 𝑘 ∈ [𝑑] • Boolean conjunctions without fairness constraint 𝑅 𝑇 = 𝑂(𝑘2 𝑑) • For such 𝐶, KWIK bound is at least 𝑚 𝜖, 𝛿 = Ω 2 𝑑 • For 𝛿 < 1 2𝑇 , worst case regret bound is 𝑅 𝑇 = Ω 2 𝑑
22. 22. Fair to KWIK
23. 23. Institution of FairToKWIK • Divide domain of 𝑓(𝑥 𝑡) s.t. each width becomes 𝜖∗ • Using fair algorithm, 𝑓 𝑥 𝑡0 𝜖∗ 2𝜖∗ 𝑥(0) 𝑥(1) 𝑥(2) 𝑥 𝑡 𝑥(ℓ) 𝑥 𝑡 > < ? 𝑥(3) 𝑥(4) 𝑝ℓ,1 𝑝ℓ,2 Prob. of choosing left arm Prob. of choosing right arm If 𝑝ℓ,1 ≠ 𝑝ℓ,2 for all ℓ ≠ 3, 𝑥 𝑡 is in the red area Output 3𝜖∗ Otherwise, Output ⊥
24. 24. Conclusions • Fairness in contextual bandit problem and classic bandit problem • 𝛿-fair: with probability 1 − 𝛿 𝜋𝑗|ℎ 𝑡 > 𝜋𝑗′|ℎ 𝑡 only if 𝑓𝑗 𝑥𝑗 𝑡 > 𝑓𝑗′ 𝑥𝑗′ 𝑡 Results • Classical Bandits: Necessary and sufficient rounds to achieve non-trivial regret is Θ 𝑘3 • Contextual Bandits: Tightly relationship with Knows What it Knows (KWIK) learning