Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

Lesson 35: Game Theory and Linear Programming

28,033 views

Published on

This connects two topics of the last few weeks. The optimal strategies to a matrix game turn out be solutions to linear programming problems. In fact, the strategies are the solutions to the primal and dual versions of the same problem!

• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• page 32, I think it should be x_1 + 3*x_2 + 2 * x_3

Are you sure you want to  Yes  No
Your message goes here
• extremely helpful slides. thank you!

Are you sure you want to  Yes  No
Your message goes here

Lesson 35: Game Theory and Linear Programming

1. 1. Lesson 35 Game Theory and Linear Programming Math 20 December 14, 2007 Announcements Pset 12 due December 17 (last day of class) Lecture notes and K&H on website next OH Monday 1–2 (SC 323)
2. 2. Outline Recap Deﬁnitions Examples Fundamental Theorem Games we can solve so far GT problems as LP problems From the continuous to the discrete Standardization Rock/Paper/Scissors again The row player’s LP problem
3. 3. Deﬁnition A zero-sum game is deﬁned by a payoﬀ matrix A, where aij represents the payoﬀ to the row player if R chooses option i and C chooses option j.
4. 4. Deﬁnition A zero-sum game is deﬁned by a payoﬀ matrix A, where aij represents the payoﬀ to the row player if R chooses option i and C chooses option j. The row player chooses from the rows of the matrix, and the column player from the columns. The payoﬀ could be a negative number, representing a net gain for the column player.
5. 5. Deﬁnition A strategy for a player consists of a probability vector representing the portion of time each option is employed.
6. 6. Deﬁnition A strategy for a player consists of a probability vector representing the portion of time each option is employed. We use a row vector p for the row player’s strategy, and a column vector q for the column player’s strategy. A pure strategy (select the same option every time) is represented by a standard basis vector ej or ej . For instance, if R has three choices and C has ﬁve:  0 e2 = 1 e4 = 0 0 0 1 0 0 A non-pure strategy is called mixed.
7. 7. Deﬁnition The expected value of row and column strategies p and q is the scalar n E (p, q) = pi aij qj = pAq i,j=1
8. 8. Deﬁnition The expected value of row and column strategies p and q is the scalar n E (p, q) = pi aij qj = pAq i,j=1 Probabilistically, this is the amount the row player receives (or the column player if it’s negative) if players employ these strategies.
9. 9. Rock/Paper/Scissors Example What is the payoﬀ matrix for Rock/Paper/Scissors?
10. 10. Rock/Paper/Scissors Example What is the payoﬀ matrix for Rock/Paper/Scissors? Solution The payoﬀ matrix is   0 −1 1 0 −1 . A= 1 −1 1 0
11. 11. Example Consider a new game: players R and C each choose a number 1, 2, or 3. If they choose the same thing, C pays R that amount. If they choose diﬀerently, R pays C the amount that C has chosen. What is the payoﬀ matrix?
12. 12. Example Consider a new game: players R and C each choose a number 1, 2, or 3. If they choose the same thing, C pays R that amount. If they choose diﬀerently, R pays C the amount that C has chosen. What is the payoﬀ matrix? Solution   1 −2 −3 A = −1 2 −3 −1 −2 3
13. 13. Theorem (Fundamental Theorem of Matrix Games) There exist optimal strategies p∗ for R and q∗ for C such that for all strategies p and q: E (p∗ , q) ≥ E (p∗ , q∗ ) ≥ E (p, q∗ )
14. 14. Theorem (Fundamental Theorem of Matrix Games) There exist optimal strategies p∗ for R and q∗ for C such that for all strategies p and q: E (p∗ , q) ≥ E (p∗ , q∗ ) ≥ E (p, q∗ ) E (p∗ , q∗ ) is called the value v of the game.
15. 15. Reﬂect on the inequality E (p∗ , q) ≥ E (p∗ , q∗ ) ≥ E (p, q∗ ) In other words, E (p∗ , q) ≥ E (p∗ , q∗ ): R can guarantee a lower bound on his/her payoﬀ E (p∗ , q∗ ) ≥ E (p, q∗ ): C can guarantee an upper bound on how much he/she loses This value could be negative in which case C has the advantage
16. 16. Fundamental problem of zero-sum games Find the p∗ and q∗ ! Last time we did these: Strictly-determined games 2 × 2 non-strictly-determined games The general case we’ll look at next.
17. 17. Pure Strategies are optimal in Strictly-Determined Games Theorem Let A be a payoﬀ matrix. If ars is a saddle point, then er is an optimal strategy for R and es is an optimal strategy for C. Also v = E (er , es ) = ars .
18. 18. Optimal strategies in 2 × 2 non-Strictly-Determined Games Let A be a 2 × 2 matrix with no saddle points. Then the optimal strategies are a − a  22 12 a22 − a21 a11 − a12 ∆ p= q = a11 − a21   ∆ ∆ ∆ where ∆ = a11 + a22 − a12 − a21 . Also |A| v= ∆
19. 19. Outline Recap Deﬁnitions Examples Fundamental Theorem Games we can solve so far GT problems as LP problems From the continuous to the discrete Standardization Rock/Paper/Scissors again The row player’s LP problem
20. 20. This could get a little weird This derivation is not something that needs to be memorized, but should be understood at least once.
21. 21. Objectifying the problem Let’s think about the problem from the column player’s perspective. If she chooses strategy q, and R knew it, he would choose p to maximize the payoﬀ pAq. Thus the column player wants to minimize that quantity. That is, C ’s objective is realized when the payoﬀ is E = min max pAq. q p
22. 22. Objectifying the problem Let’s think about the problem from the column player’s perspective. If she chooses strategy q, and R knew it, he would choose p to maximize the payoﬀ pAq. Thus the column player wants to minimize that quantity. That is, C ’s objective is realized when the payoﬀ is E = min max pAq. q p This seems hard! Luckily, linearity, saves us.
23. 23. From the continuous to the discrete Lemma Regardless of q, we have max pAq = max ei Aq p 1≤i≤m Here ei is the probability vector represents the pure strategy of going only with choice i.
24. 24. From the continuous to the discrete Lemma Regardless of q, we have max pAq = max ei Aq p 1≤i≤m Here ei is the probability vector represents the pure strategy of going only with choice i. The idea is that a weighted average of things is no bigger than the largest of them. (Think about grades).
25. 25. Proof of the lemma Proof. We must have max pAq ≥ max ei Aq p 1≤i≤m (the maximum over a larger set must be at least as big). On the other hand, let q be C ’s strategy. Let the quantity on the right be maximized when i = i0 . Let p be any strategy for R. Notice that p = i pi ei . So m m pi ei Aq ≤ E (p, q) = pAq = pi ei0 Aq i=1 i=1 m = pi ei0 Aq = ei0 Aq. i=1 Thus max pAq ≤ ei0 Aq. p
26. 26. The next step is to introduce a new variable v representing the value of this inner maximization. Our objective is to minimize it. Saying it’s the maximum of all payoﬀs from pure strategies is the same as saying v ≥ ei Aq for all i. So we ﬁnally have something that looks like an LP problem! We want to choose q and v which minimize v subject to the constraints v ≥ ei Aq i = 1, 2, . . . m qj ≥ 0 j = 1, 2, . . . n n qj = 1 j=1
27. 27. Trouble with this formulation Simplex method with equalities? Not in standard form Resolution: We may assume all aij ≥ 0, so v > 0 qj Let xj = v
28. 28. Since we know v > 0, we still have x ≥ 0. Now n n 1 1 xj = qj = . v v j=1 j=1 So our problem is now to choose x ≥ 0 which maximizes xj . j The constraints now take the form v ≥ ei Aq ⇐⇒ 1 ≥ ei Ax, for all i. Another way to write this is Ax ≤ 1, where 1 is the vector consisting of all ones.
29. 29. Upshot Theorem Consider a game with payoﬀ matrix A, where each entry of A is x positive. The column player’s optimal strategy q is , x1 + · · · + xn where x ≥ 0 satisﬁes the LP problem of maximizing x1 + · · · + xn subject to the constraints Ax ≤ 1.
30. 30. Rock/Paper Scissors The payoﬀ matrix is   0 −1 1 0 −1 . A= 1 −1 1 0
31. 31. Rock/Paper Scissors The payoﬀ matrix is   0 −1 1 0 −1 . A= 1 −1 1 0 We can add 2 to everything to make   213 ˜ A = 3 2 1 . 132
32. 32. Convert to LP The problem is to maximize x1 + x2 + x3 subject to the constraints 2x1 + x2 + 3x3 ≤ 1 3x1 + 2x2 + x3 ≤ 1 x1 + 3x3 + 2x3 ≤ 1. We introduce slack variables y1 , y2 , and y3 , so the constraints now become 2x1 + x2 + 3x3 + y1 = 1 3x1 + 2x2 + x3 + y2 = 1 x1 + 3x3 + 2x3 + y3 = 1.
33. 33. An easy initial basic solution is to let x = 0 and y = 1. The initial tableau is therefore x1 x2 x3 y1 y2 y3 z value y1 2 1 3100 0 1 y2 3 2 1010 0 1 y3 1 3 2001 0 1 z −1 −1 −1 0 0 0 1 0
34. 34. Which should be the entering variable? The coeﬃcients in the bottom row are all the same, so let’s just pick one, x1 . To ﬁnd the departing variable, we look at the ratios 1 , 3 , and 1 . So y2 is the 1 2 1 departing variable. We scale row 2 by 1 : 3 x1 x2 x3 y1 y2 y3 z value y1 2 1 31 00 0 1 y2 1 2/3 1/3 0 1/3 0 0 1/3 y3 1 3 20 01 0 1 −1 −1 −1 0 z 00 1 0
35. 35. Then we use row operations to zero out the rest of column one: x1 x2 x3 y1 y2 y3 z value 0 −1/3 1 −2/3 0 y1 7/3 0 1/3 x1 1 2/3 1/3 0 1/3 0 0 1/3 0 −1/3 1 y3 0 7/3 5/3 0 2/3 0 −1/3 −2/3 0 z 1/3 0 1 1/3
36. 36. We can still improve this: x3 is the entering variable and y1 is the departing variable. The new tableau is x1 x2 x3 y1 y2 y3 z value 0 −1/7 1 3/7 −2/7 x3 0 0 1/7 0− x1 1 5/7 1/7 3/7 0 0 2/7 0 18/7 0 −5/7 y3 1/7 1 0 3/7 0− z 3/7 0 2/7 1/7 0 1 3/7
37. 37. Finally, entering x2 and departing y3 gives x1 x2 x3 y1 y2 y3 z value 7/18 −5/18 x3 001 1/18 0 1/6 7/18 −5/18 0 x1 100 1/18 1/6 0 1 0 −5/18 x2 1/18 7/18 0 1/6 z 000 1/6 1/6 1/6 1 1/2
38. 38. So the x variables have values x1 = 1/6, x2 = 1/6, x3 = 1/6. Furthermore z = x1 + x2 + x3 = 1/2, so v = 1/z = 2. This also means that p1 = 1/3, p2 = 1/3, and p3 = 1/3. So the optimal strategy is to do each thing the same number of times.
39. 39. Outline Recap Deﬁnitions Examples Fundamental Theorem Games we can solve so far GT problems as LP problems From the continuous to the discrete Standardization Rock/Paper/Scissors again The row player’s LP problem
40. 40. Now let’s think about the problem from the column player’s perspective. If he chooses strategy p, and C knew it, he would choose p to minimize the payoﬀ pAq. Thus the row player wants to maximize that quantity. That is, R’s objective is realized when the payoﬀ is E = max min pAq. p q
41. 41. Lemma Regardless of p, we have min pAq = min pAej q 1≤j≤n
42. 42. The next step is to introduce a new variable v representing the ˜ value of this inner minimization. Our objective is to maximize it. Saying it’s the minimum of all payoﬀs from pure strategies is the same as saying v ≤ pAej ˜ for all j. Again, we have something that looks like an LP problem! We want to choose p and v which maximize v subject to the ˜ ˜ constraints v ≤ pAej ˜ j = 1, 2, . . . n pi ≥ 0 i = 1, 2, . . . m m pi = 1 i=1
43. 43. As before, we can standardize this by renaming 1 y= p v ˜ (this makes y a column vector). Then m 1 yi = , v ˜ i=1 So maximizing v is the same as minimizing 1 y. Likewise, the ˜ equations of constraint become v ≤ (˜ y )Aej for all j, or y A ≥ 1 , ˜ v or (taking transposes) A y ≥ 1. If all the entries of A are positive, we may assume that v is positive, so the constraints p ≥ 0 are ˜ satisﬁed if and only if y ≥ 0.
44. 44. Upshot Theorem Consider a game with payoﬀ matrix A, where each entry of A is y positive. The row player’s optimal strategy p is , y1 + · · · + yn where y ≥ 0 satisﬁes the LP problem of minimizing y1 + · · · + yn = 1 y subject to the constraints A y ≥ 1.
45. 45. The big idea The big observation is this: Theorem The row player’s LP problem is the dual of the column player’s LP problem.
46. 46. The ﬁnal tableau in the Rock/Paper/Scissors LP problem was this: x1 x2 x3 y1 y2 y3 z value 7/18 −5/18 x3 001 1/18 0 1/6 7/18 −5/18 0 x1 100 1/18 1/6 0 1 0 −5/18 x2 1/18 7/18 0 1/6 z 000 1/6 1/6 1/6 1 1/2 The entries in the objective row below the slack variables are the solutions to the dual problem! In this case, we have the same values, which means R has the same strategy as C . This reﬂects the symmetry of the original game.
47. 47. Example Consider the game: players R and C each choose a number 1, 2, or 3. If they choose the same thing, C pays R that amount. If they choose diﬀerently, R pays C the amount that C has chosen. What should each do?
48. 48. Example Consider the game: players R and C each choose a number 1, 2, or 3. If they choose the same thing, C pays R that amount. If they choose diﬀerently, R pays C the amount that C has chosen. What should each do? Answer. Choice R C 1 54.5% 22.7% 2 27.3% 36.3% 3 18.2% 40.1% The expected payoﬀ is 2.71 to the column player.