Dynamics

535 views
476 views

Published on

Convergence to Nash equilibrium; Socially concave games

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
535
On SlideShare
0
From Embeds
0
Number of Embeds
48
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • If it reaches to NE it stays there
  • Say why it id important
  • Different slopes
  • Dynamics

    1. 1. On the Convergence of Regret Minimizing Dynamics in Concave Games Joint work with Eyal Even Dar , Yishay Mansour Microsoft Research, Cambridge UK, March 26, 2009 Uri Nadav Tel Aviv University, Tel Aviv Israel
    2. 2. Nash Equilibrium <ul><li>Nash equilibrium is a steady state of the game </li></ul><ul><ul><li>No player has an incentive to unilaterally deviate from his state </li></ul></ul><ul><li>Existence (pure strategy) </li></ul><ul><li>Uniqueness </li></ul><ul><li>Quality </li></ul><ul><ul><li>Price of Anarchy/Stability </li></ul></ul><ul><li>Dynamics: Reaching an equilibrium </li></ul>Player I Player II 0 ,0 4 ,1 1 ,4 3 ,3
    3. 3. Dynamics Player 1: Player 2: Day 1 Day 2 Day 3 Day 4 Day 5 0,0 4,1 1,4 3,3
    4. 4. Example Dynamics <ul><li>Best Response </li></ul><ul><ul><li>On each day adjust to other players </li></ul></ul><ul><ul><li>Ignore the fact that they also adjust </li></ul></ul>Player 1: Player 2: Day 1 Day 2 Day 3 Day 4 Day 5 Unfortunately, does not always converge to equilibrium 0,0 4,1 1,4 3,3
    5. 5. No External Regret <ul><li>Define regret in T time steps as: </li></ul><ul><li>No single action significantly outperforms dynamics </li></ul>( total cost of best fixed row in hindsight ) ( total cost of alg ) - Regret Alg ( T ) = A procedure is “without external regret” if for every sequence the external regret is sublinear in T <ul><li>Many (different) algorithms can guarantee this [Hannan 57], [Blackwell 56][Banos 68][Megiddo 80][Fundberg, Levine 94][Auer et. al 95]… </li></ul>4 Alg Cost Weather 3 Umbrella 4 No umbrella 0 1 1 0
    6. 6. Our Main Result <ul><li>If each player uses a procedure without regret in some class of interesting games then their joint play converges to Nash equilibrium </li></ul>Resource Allocation Cournot Oligopoly Socially concave games Selfish Routing TCP Congestion Control
    7. 7. Cournot Oligopoly [Cournot 1838] <ul><li>Firms select production level ( supply) </li></ul><ul><li>Market price depends on total supply </li></ul><ul><li>Firms maximize their Profit = Revenue - Cost </li></ul><ul><li>Best response dynamics: </li></ul><ul><li>Converges for 2 players </li></ul><ul><li>Diverges for n  5 [Theocharis 1960] </li></ul>X Y Cost 1 (X) Cost 2 (Y) We will show no-regret dynamics converges to NE for any number of players Market price Overall quantity X y P
    8. 8. Resource Allocation Games <ul><li>Each advertiser wins a proportional market share </li></ul>$5M $10M $17M $25M <ul><li>Advertisers set budgets </li></ul>‘ s allocated rate = 5+10+17+25 25 <ul><li>Utility: </li></ul><ul><ul><li>Concave utility from allocated rate </li></ul></ul><ul><ul><li>Quasi-linear with money </li></ul></ul><ul><li>Equilibrium : </li></ul><ul><li>Existence & Uniqueness [Hajek, Gopalakrishnan] </li></ul><ul><li>Efficiency Loss (POA) 3/4 [Johari, Tsitsiklis] </li></ul>We can show that the best response dynamics generally diverges for linear resource allocation games f ( U = ) - $25M
    9. 9. Routing Games <ul><li>Atomic </li></ul>s 1 t 1 t 2 s 2 f 1, L f 1, R f 1,L f 2,T Latency on edge e = L e (f 1,L + f 2,T ) e f 2, T f 2, B <ul><li>Cost i =  p 2 (s i , t i ) Latency(p) * flow i (p) </li></ul>f 2 f 1 <ul><li>Splittable flows </li></ul>
    10. 10. Socially Concave Games <ul><li>Closed convex strategy set </li></ul><ul><li>A (weighted) social welfare is concave </li></ul><ul><li>The utility of a player is convex in the vector of actions of other players </li></ul>R There exists  1 ,…,  n > 0 Such that  1 u 1 (x) +  2 u 2 (x)+…+  n u n (x) Zero Sum Games ½ Socially concave games <ul><li>Some socially concave games </li></ul><ul><li>Subclass of Cournot competition, Resource allocation, Selfish Routing, TCP congestion control(Near equilibrium) </li></ul><ul><li>Concave Games [ Rosen 65] </li></ul><ul><li>The utility of a player is a concave function of her own strategy </li></ul><ul><li>Every socially concave game is a concave game </li></ul>
    11. 11. Our Main Result <ul><li>The average action profile converges to NE </li></ul>Player 1: Player 2: Player n : Day 1 Day 2 Day 3 Day T Average of days 1… T  - Nash equilibrium <ul><li>The average daily payoff of each player converges to her payoff in NE </li></ul>If each players uses a procedure without regret in socially concave games then their joint play converges to Nash equilibrium: - Nash Eq
    12. 12. Convergence to NE – Proof Outline <ul><li>Goal : Show that for every player, the utility from the average action profile equals the utility of playing best-response to the average </li></ul>Utility of player i at average  Utility of i playing Best Response to the average   Definition of  - Nash equilibrium Player 1: Player 2: Player n : Day 1 Day 2 Day 3 Day T Average
    13. 13. Convergence to NE – Proof Outline <ul><li>Upper bound on the utility of the average action profile : </li></ul>For each player i : By definition of Best Response Sum of utilities Utility of average action profile Convexity: The minimum utility is attained when the others play fixed action Holds for every action z
    14. 14. Convergence to NE – Proof Outline <ul><li>Lower bound on the sum of average utilities : </li></ul>By assumption, there exists  1 ,…,  n such that : is concave Utility of average action profile (Average) Sum of utilities Concavity: The sum of utilities at average is greater than the average utility
    15. 15. Convergence to NE – Proof Outline <ul><li>Punch line </li></ul>Upper Bound = Lower Bound + Average Regret Upper Bound ¸ ¸ Lower Bound Q.E.D
    16. 16. Convergence in Almost Socially Concave Games <ul><li>TCP game is a Concave game </li></ul><ul><ul><li>[Karp, Koutouspias, Papadimitriou, Shenker] </li></ul></ul><ul><li>And the weighted social welfare is concave </li></ul><ul><li>But, the utility of player i is not convex in the entire strategy space of the other players </li></ul>Therefore, the convergence theorem cannot be directly applied Playing gradient based dynamics, guarantees “playing” in a “socially concave zone” Playing gradient based dynamics, guarantees “no regret” in concave decision making [Zinkevich]
    17. 17. Regret Minimization & Equilibrium <ul><li>Specific games: </li></ul><ul><ul><li>Routing [Blum, Even-Dar, Ligett] </li></ul></ul><ul><ul><li>Price of Total Anarchy [Blum, Ligett, Hajiaghay, Roth] </li></ul></ul><ul><li>Zero sum game: </li></ul><ul><ul><li>Guarantee at least min-max value </li></ul></ul><ul><li>Correlated equilibrium: </li></ul><ul><ul><li>Internal/Swap regret dynamics converge to it [Foster, Vohra], [Hart, Mas-Colell], [Blum Mansour] </li></ul></ul>
    18. 18. Ongoing Research I <ul><li>Extend for general resource allocation games </li></ul><ul><ul><li>A set of resources </li></ul></ul><ul><ul><li>Players buy a path (subset of resources) </li></ul></ul><ul><li>Resource allocation in parallel edges is socially concave </li></ul><ul><li>We studied the allocation of a single link </li></ul><ul><li>An equilibrium does not necessarily exists in general networks </li></ul><ul><ul><li>Always exists in [Johari, Tsitsiklis] extended game </li></ul></ul><ul><ul><li>Not socially concave </li></ul></ul>
    19. 19. Ongoing Research II <ul><li>Resource Allocation Game </li></ul><ul><ul><li>Players act as price anticipators </li></ul></ul><ul><li>No regret </li></ul><ul><ul><li>Players have no regret if they believe that they don’t influence the market price </li></ul></ul><ul><ul><li>Simulation Results: fast convergence to market equilibrium </li></ul></ul><ul><li>Resource Allocation Market </li></ul><ul><ul><li>Players act as price takers </li></ul></ul><ul><ul><li>Efficient competitive equilibrium exists (price + bids) [Kelly] </li></ul></ul><ul><ul><li>Continuous time algorithms converge to equilibrium [Kelly et. al] </li></ul></ul>
    20. 20. Other stuff I work on FIFO <ul><li>Congestion Pricing in a Queue </li></ul><ul><ul><li>For latency sensitive agents </li></ul></ul>Slot #1 Slot #2 Slot #3 Slot #4 Slot #5 Slot #6 time <ul><li>Protocols for Selfish Agents </li></ul><ul><ul><li>For accessing a shared resource </li></ul></ul><ul><li>Fault Tolerant Storage and Quorum Systems </li></ul><ul><ul><li>in Dynamic Environment </li></ul></ul><ul><li>Optimization Algorithms in Ad-Auctions </li></ul><ul><ul><li>Auctions with broad-match options </li></ul></ul>
    21. 21. Tha n k you! <ul><li>Questions, Comments? </li></ul>
    22. 22. TCP Congestion Control [kkps 01 ] User Utility: u i = g i –  i l i f i g i l i User action: push flow f i good-put: fraction of f i forwarded loss: l i = fraction of f i discard Channel Fraction of good-put determined by router policy  i : associated cost with lost flow retransmission, lost bandwidth, utilization
    23. 23. Router Policy <ul><li>Tail Drop </li></ul><ul><ul><li>Drop an incoming packet when out of space </li></ul></ul><ul><li>Random Early Discard (RED) </li></ul><ul><ul><li>Number of dropped packets increases as queue grows </li></ul></ul>Amount of flow to discard depends on the total amount of flow <ul><li>Flow models </li></ul>
    24. 24. Resource Allocation Games <ul><li>User’s are allocated rate proportionally: </li></ul>$5M $10M $17M $25M <ul><li>Users choose payment per unit time </li></ul>‘ s allocated rate = 5+10+17+25 25 <ul><li>Utility: </li></ul><ul><ul><li>Concave utility from allocated rate </li></ul></ul><ul><ul><li>Quasi-linear with money </li></ul></ul><ul><li>Equilibrium : </li></ul><ul><li>Existence & Uniqueness [Hajek, Gopalakrishnan] </li></ul><ul><li>Efficiency Loss (POA) 3/4 [Johari, Tsitsiklis] </li></ul>We can show that the best response dynamics generally diverges for linear resource allocation games f ( U = ) - $25M
    25. 25. Nash Equilibrium <ul><li>Nash equilibrium is a steady state of the game </li></ul><ul><ul><li>No player has an incentive to unilaterally deviate from his state </li></ul></ul><ul><li>Existence (pure strategy) </li></ul><ul><li>Uniqueness </li></ul><ul><li>Quality </li></ul><ul><ul><li>Price of Anarchy/Stability </li></ul></ul><ul><li>Dynamics: Reaching an equilibrium </li></ul>Player I: Player II ½ ½ ½ ½ 0 ,0 4 ,1 1 ,4 3 ,3

    ×