Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction of “Fairness in Learning: Classic and Contextual Bandits”


Published on

This material consists of an introduction of a paper titled “Fairness in Learning: Classic and Contextual Bandits” from NIPS2016. This is presented at

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Introduction of “Fairness in Learning: Classic and Contextual Bandits”

  1. 1. Introduction of “Fairness in Learning: Classic and Contextual Bandits” authorized by Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth NIPS2016-Yomi January 19, 2017 Presenter: Kazuto Fukuchi
  2. 2. Fairness in Machine Learning Consequential decisions using machine learning may lead unfair treatment E.g., Google’s ad suggestion system [Sweeney 13] Fairness in contextual bandit problem African descent names European descent names Arrested? Located Negative ad. Neutral ad.
  3. 3. Individual fairness 𝐾 persons • Choose one person for conducting an action • E.g., lend loan, hire, admission, etc. When we can preferentially choose one person? Only if the person has the largest ability There is no other reason for preferential choice Payback 90% Payback 60% >
  4. 4. Contextual Bandit Problem Each round 𝑡 1. Obtain a context 𝑥𝑗 𝑡 for each arm 𝑗 2. Choose one arm 𝑖 𝑡 ∈ [𝐾] 3. Observe reward 𝑟𝑖 𝑡 𝑡 s.t. 𝔼 𝑟𝑗 𝑡 = 𝑓𝑗 𝑥𝑗 𝑡 and 𝑟𝑗 𝑡 ∈ [0,1] a.s. 𝐾-arms 𝑓1 𝑓2 𝑓3 𝑓4 𝑓5 Unknown to the learner Goal: Maximize the expected cumulative reward 𝔼 𝑡 𝑟𝑖 𝑡 𝑡 = 𝔼 𝑡 𝑓𝑖 𝑡 𝑥𝑖 𝑡 𝑡
  5. 5. Example: Linear Contextual Bandit Define 𝐶 = 𝑓𝜃 ∶ 𝑓𝜃 𝑥 = 𝜃, 𝑥 , 𝜃 ∈ ℝ 𝑑 , 𝜃 ≤ 1 𝒳 = 𝑥 ∈ ℝ 𝑑 ∶ 𝑥 ≤ 1 • Suppose 𝑓𝑗 = 𝑓𝜃 𝑗 ∈ 𝐶, 𝑥𝑗 𝑡 ∈ 𝒳 E.g., Online recommendation • 𝜃𝑗: Feature of a product 𝑗 • 𝑥𝑗 𝑡 : Feature of a user 𝑡 regarding the product 𝑗 • Score of a user 𝑡 for a product 𝑗 is an inner product 𝑥𝑗 𝑡 , 𝜃𝑗
  6. 6. Example: Classic Bandit • Expected reward is 𝔼 𝑟𝑗 𝑡 = 𝜇 𝑗 • Set 𝑓𝑗 𝑥𝑗 𝑡 = 𝜇 𝑗 for any 𝑥𝑗 𝑡 • Then, the contextual bandit becomes to the classic bandit 𝜇1 𝜇2 𝜇3 𝜇4 𝜇5
  7. 7. Regret • History ℎ 𝑡: a record of 𝑡 − 1 experiences • contexts, arm chosen, and reward observed • A policy 𝜋: mapping from 𝑥 𝑡 and ℎ 𝑡 to a distribution on arms [𝐾] • Probability of choosing arm 𝑗 with ℎ 𝑡 at round 𝑡 𝜋𝑗|ℎ 𝑡 𝑡 Regret: Dropped reward compared to the optimal policy Regret 𝑥1 , … , 𝑥 𝑇 = 𝑡 max 𝑗 𝑓𝑗 𝑥𝑗 𝑡 − 𝔼𝑖 𝑡∼𝜋 𝑡 𝑡 𝑓𝑖 𝑡 𝑥𝑖 𝑡 𝑡 Regret bound 𝑅(𝑇) if max 𝑥1,…,𝑥 𝑇 Regret 𝑥1 , … , 𝑥 𝑇 ≤ 𝑅(𝑇)
  8. 8. Fairness Constraint It is unfair to preferentially choose one individual without an acceptable reason A policy 𝜋 is 𝜹-fair if with probability 1 − 𝛿 𝜋𝑗|ℎ 𝑡 > 𝜋𝑗′|ℎ 𝑡 only if 𝑓𝑗 𝑥𝑗 𝑡 > 𝑓𝑗′ 𝑥𝑗′ 𝑡 . Quality of the chosen individual is larger than others. Probability of choosing arm 𝑗 at round 𝑡 𝑓𝑗(𝑥𝑗 𝑡 ) > 𝑓𝑗′(𝑥𝑗′ 𝑡 )
  9. 9. Institution of Fairness Constraint • Optimal policy is fair • But we can’t get the optimal policy due to unknown 𝑓1, … , 𝑓𝐾 > Can’t distinguish which arm has high expected reward Expected reward is lower than the left group with h.p. Fairness constraint enforces to choose a arm from the left group with uniform distribution
  10. 10. Fairness in Classic Bandit • Consider confidence bounds of the expected rewards • Choose uniformly from the chained group expected rewards Arm 1 Arm 2 Arm 3 Arm 4 Arm 5 Chained Expected reward is lower than that of arms in the chained group
  11. 11. Fair Algorithm for Classic Bandit
  12. 12. Regret Upper Bound If 𝛿 < 1 𝑇 , then FairBandits has regret 𝑅 𝑇 = 𝑂 𝑘3 𝑇 ln 𝑇𝑘 𝛿 • 𝑇 = Ω 𝑘3 rounds require to obtain non-trivial regret, i.e., 𝑅 𝑇 𝑇 ≪ 1 • Non-fair case: 𝑂 𝑘𝑇 • 𝑘 becomes 𝑘3 by fairness constraint • Dependence on 𝑇 is optimal
  13. 13. Regret Lower Bound Any fair algorithm experiences constant per-round regret for at least 𝑇 = Ω 𝑘3 ln 1 𝛿 • constant per-round regret = non-trivial regret • To achieve non-trivial regret, we need at least 𝑘3 rounds • Thus, Ω 𝑘3 is necessary and sufficient
  14. 14. Fairness in Contextual Bandit KWIK learnable = Fair bandit learnable KWIK (Know What It Know) learning • Online regression • Learner outputs either prediction 𝑦 𝑡 ∈ [0,1] or 𝑦 𝑡 =⊥ • ⊥ denotes “I Don’t Know” • Only when 𝑦 𝑡 =⊥, the learner observes feedback 𝑦 𝑡 s.t. 𝔼 𝑦 𝑡 = 𝑓 𝑥 𝑡 𝑥 𝑡 Feature Learner 𝑦 𝑡 ∈ [0,1] “I Don’t Know” Accurately predictable
  15. 15. KWIK learnable (𝜖, 𝛿)-KWIK learnable on a class 𝑓 ∈ 𝐶 with 𝑚 𝜖, 𝛿 if 1. 𝑦 𝑡 ∈ ⊥ ∪ [𝑓 𝑥 𝑡 − 𝜖, 𝑓 𝑥 𝑡 + 𝜖] for all 𝑡 w.p. 1 − 𝛿 2. 𝑡=1 ∞ 𝕀 𝑦 𝑡 =⊥ ≤ 𝑚 𝜖, 𝛿 Institutions • Prediction is accurate if 𝑦 𝑡 ≠⊥ • With small number of answering ⊥ • number of answering ⊥ = 𝑚 𝜖, 𝛿
  16. 16. KWIK Learnability Implies Fair Bandit Learnability Suppose 𝐶 is (𝜖, 𝛿)-KWIK learnable with 𝑚 𝜖, 𝛿 Then, there is 𝛿-fair algorithm for 𝑓𝑗 ∈ 𝐶 s.t. 𝑅 𝑇 = 𝑂 max 𝑘2 𝑚 𝜖∗, min 𝛿, 1 𝑇 𝑇2 𝑘 , 𝑘3 ln 𝑘 𝛿 For 𝛿 ≤ 1 𝑇 where 𝜖∗ = arg min 𝜖 max 𝜖𝑇, 𝑘𝑚 𝜖, min 𝛿, 1 𝑇 𝑇2 𝑘
  17. 17. Linear Contextual Bandit Case • Let 𝐶 = 𝑓𝜃 ∶ 𝑓𝜃 𝑥 = 𝜃, 𝑥 , 𝜃 ∈ ℝ 𝑑, 𝜃 ≤ 1 𝒳 = 𝑥 ∈ ℝ 𝑑 ∶ 𝑥 ≤ 1 • Then, 𝑅 𝑇 = 𝑂 max 𝑇 4 5 𝑘 6 5 𝑑 3 5, 𝑘3 ln 𝑘 𝛿
  18. 18. KWIK to Fair
  19. 19. Institution of KWIKToFair • Predict the expected rewards using KWIK algorithm for each arm • If the outputs of KWIK algorithm is not ⊥ • Same strategy of classic bandit is applicable expected rewards 𝑓𝑗 𝑥𝑗 𝑡 Arm 1 Arm 2 Arm 3 Arm 4 Arm 5 2𝜖∗
  20. 20. Fair Bandit Learnability Implies KWIK Learnability Suppose • There is 𝛿-fair algorithm for 𝑓𝑗 ∈ 𝐶 with regret 𝑅 𝑇, 𝛿 • There exists 𝑓 ∈ 𝐶, 𝑥 ℓ ∈ 𝒳 s.t. 𝑓 𝑥 ℓ = ℓ𝜖 for ℓ = 1, … , 1 𝜖 Then, there is (𝜖, 𝛿)-KWIK learnable algorithm for 𝐶 with 𝑚 𝜖, 𝛿 is the solution of 𝑚 𝜖, 𝛿 𝜖 4 = 𝑅 𝑚 𝜖, 𝛿 , 𝜖𝛿 2𝑇
  21. 21. An Exponential Separation Between Fair and Unfair Learning • Boolean conjunctions: Let 𝑥 ∈ 0,1 𝑑 𝐶 = 𝑓|𝑓 𝑥 = 𝑥𝑖1 ∧ ⋯ ∧ 𝑥𝑖 𝑘 , 0 ≤ 𝑘 ≤ 𝑑, 𝑖1, … , 𝑖 𝑘 ∈ [𝑑] • Boolean conjunctions without fairness constraint 𝑅 𝑇 = 𝑂(𝑘2 𝑑) • For such 𝐶, KWIK bound is at least 𝑚 𝜖, 𝛿 = Ω 2 𝑑 • For 𝛿 < 1 2𝑇 , worst case regret bound is 𝑅 𝑇 = Ω 2 𝑑
  22. 22. Fair to KWIK
  23. 23. Institution of FairToKWIK • Divide domain of 𝑓(𝑥 𝑡) s.t. each width becomes 𝜖∗ • Using fair algorithm, 𝑓 𝑥 𝑡0 𝜖∗ 2𝜖∗ 𝑥(0) 𝑥(1) 𝑥(2) 𝑥 𝑡 𝑥(ℓ) 𝑥 𝑡 > < ? 𝑥(3) 𝑥(4) 𝑝ℓ,1 𝑝ℓ,2 Prob. of choosing left arm Prob. of choosing right arm If 𝑝ℓ,1 ≠ 𝑝ℓ,2 for all ℓ ≠ 3, 𝑥 𝑡 is in the red area Output 3𝜖∗ Otherwise, Output ⊥
  24. 24. Conclusions • Fairness in contextual bandit problem and classic bandit problem • 𝛿-fair: with probability 1 − 𝛿 𝜋𝑗|ℎ 𝑡 > 𝜋𝑗′|ℎ 𝑡 only if 𝑓𝑗 𝑥𝑗 𝑡 > 𝑓𝑗′ 𝑥𝑗′ 𝑡 Results • Classical Bandits: Necessary and sufficient rounds to achieve non-trivial regret is Θ 𝑘3 • Contextual Bandits: Tightly relationship with Knows What it Knows (KWIK) learning