Csr2011 june14 16_30_ibsen-jensen

244 views
202 views

Published on

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
244
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Csr2011 june14 16_30_ibsen-jensen

  1. 1. The complexity of solving reachability games using value and strategy iteration Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Peter Bro Miltersen Aarhus University Denmark CSR 2011, 14’th June
  2. 2. Overview <ul><li>What are concurrent reachabillity games? </li></ul><ul><li>Two standard algorithms solving concurrent reachabillity games: </li></ul><ul><ul><li>The value iteration algorithm </li></ul></ul><ul><ul><li>The strategy iteration algorithm </li></ul></ul><ul><li>Examplify important facts for the proof of the time lower bound for both algorithms </li></ul>1/42
  3. 3. Matrix games von Neumann 1928 2/42 0 1 -1 -1 0 1 1 -1 0
  4. 4. Matrix games von Neumann 1928 2/42 0 1 -1 -1 0 1 1 -1 0
  5. 5. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer vs. Dante* Lucifer* 0 1 * Naming convention from Hansen, Koucky and Miltersen, 2009 3/42 0 1 -1 -1 0 1 1 -1 0
  6. 6. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 vs. Dante* Lucifer* Each entry can be either 0, 1 or a pointer * Naming convention from Hansen, Koucky and Miltersen, 2009 3/42
  7. 7. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer 3/42
  8. 8. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0 , 1 or a pointer 3/42 0 0 0 0 0 0 0 0 0 0 0 0
  9. 9. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer 3/42 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0
  10. 10. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer S: 3/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  11. 11. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer S: 3/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  12. 12. Histories Each entry can be either 0, 1 or a pointer S: 4/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  13. 13. Histories and strategies <ul><li>History: Sequence of positions and choices for each player in each position. </li></ul><ul><li>Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history </li></ul><ul><li>S 1 : Set of strategies for Dante </li></ul><ul><li>S 2 : Set of strategies for Lucifer </li></ul><ul><li>H 1 /H 2 : Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history) </li></ul>5/42
  14. 14. Payoffs <ul><li>v(i, σ , π ) : T he probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π . </li></ul>6/42
  15. 15. Everett 1957 Value of i 7/42
  16. 16. Algorithmic problems <ul><li>Quantitatively solving a game: Given the game, compute the value of all positions. </li></ul><ul><li>Strategically solving a game: Given the game and ε >0 , compute σ such that for all π and i: v(i, σ , π )>v i - ε . </li></ul>8/42
  17. 17. Value iteration Shapley 1953 9/42 <ul><li>Value iteration computes the value of each position in G t in iteration t , on the basis of the value of each position in G t -1 . </li></ul><ul><li>G t : A modified version of G, where Dante loses after t moves. </li></ul>
  18. 18. Our results: Lower bound for value iteration <ul><li>There exists a concurrent reachabillity game G, with N matrices and m rows and columns in each matrix, so that: </li></ul><ul><li>val(G)=1 and </li></ul><ul><li>val(G t ) = 3 m - N /2 , for t=2 m N /2 </li></ul>10/42
  19. 19. Our results: Upper bound for value iteration <ul><li>For any concurrent reachabillity game G </li></ul><ul><li>val(G)-val(G t )< ε for t=(1/ ε ) m O(N) </li></ul>11/42
  20. 20. Value iteration example – G 0 S: 12/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  21. 21. Value iteration example – G 0 S: 0 0 0 0 12/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  22. 22. Value iteration example – G 1 S: 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  23. 23. Value iteration example – G 1 S: 0 0 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  24. 24. Value iteration example – G 1 S: 0 0 0 0 1 1 1 1 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0
  25. 25. Value iteration example – G 1 S: 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 1 0 1
  26. 26. Value iteration example – G 1 S: 0 0 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 1 0 1
  27. 27. Value iteration example – G 1 0 S: 0.33333/ 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 0 1 0 0 0 1
  28. 28. Value iteration example – G 1 S: 0 0 0.33333/ 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  29. 29. Value iteration example – G 1 S: 0 0 0 0 0 0 0 0 0 0.33333/ 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0
  30. 30. Value iteration example – G 1 S: 0 0.33333/ 0 0 0/ 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0 0 0 0 0 0 0
  31. 31. Value iteration example – G 1 S: 0 0 0 0.33333/ 0 0/ 0/ 0/ 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  32. 32. Value iteration example – G 2 S: 0 0 0 0.33333/ 0.33333 0.11111/ 0/ 0/ 14/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  33. 33. Value iteration example – G 3 S: 0.11111 0 0 0.33333/ 0.33333 0.11111/ 0/ 0.03704/ 15/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  34. 34. Value iteration example – G 4 S: 0.11111 0.03704 0 0.33333/ 0.33333 0.11111/ 0.01235/ 0.03704/ 16/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  35. 35. Value iteration example – G 5 S: 0.11111 0.03704 0.01235 0.33748/ 0.33333 0.11533/ 0.01754/ 0.04147/ 17/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  36. 36. Value iteration example – G 6 S: 0.11533 0.04147 0.01754 0.33925/ 0.33748 0.11855/ 0.02172/ 0.04493/ 18/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  37. 37. Value iteration example – G 7 S: 0.11855 0.04493 0.02172 0.34068/ 0.33925 0.12064/ 0.02519/ 0.04772/ 19/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  38. 38. Value iteration example – G 8 S: 0.12064 0.04772 0.02519 0.34187/ 0.34068 0.12388/ 0.02815/ 0.04991/ 20/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  39. 39. Value iteration example – G 9 S: 0.12388 0.04991 0.02815 0.34378/ 0.34187 0.12517/ 0.03070/ 0.05129 / 21/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  40. 40. Strategy iteration Chatterjee, de Alfaro, Henzinger ’06 22/42 Was conjectured to be fast
  41. 41. Our results: Upper bound for strategy iteration <ul><li>An ε -optimal strategy is computed after t=(1/ ε ) m O(N) iterations of strategy iteration </li></ul><ul><li>This follows from the corresponding results for value iteration </li></ul>23/42
  42. 42. Our results: Lower bound for strategy iteration <ul><li>There exists a concurrent reachabillity game G, with N matrices, for large N , and m rows and columns in each matrix, so that: </li></ul><ul><li>val(G)=1 and </li></ul><ul><li>The strategy optained by strategy iteration guarantees winning probability at most 4 m - N /2 , for t= 2 m N /4 </li></ul>24/42 Strategy iteration, m=2 18446744073709551617 7 340282366920938463463374607431768211457 8 115792089237316195423570985008687907853269984665640564039457584007913129639937 9 Number of iterations needed to get over 1/2 N
  43. 43. Strategy iteration: Before iteration 1 S: <ul><li>Start strategy for Dante:= Uniform </li></ul>25/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  44. 44. Strategy iteration: Before iteration 1 S <ul><li>Start strategy for Dante:= Uniform </li></ul>0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 25/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  45. 45. Strategy iteration: Iteration 1 <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  46. 46. Strategy iteration: Iteration 1 <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  47. 47. Strategy iteration: Iteration 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  48. 48. Strategy iteration: Iteration 1 1 0.66667 The numbers on the edges are the probability that the edge is used. Edges without a number have probability 0.33333 to be used. <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>S 0 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 S 0 0 0 S 0 S S 0 0 S 0 S S 0 0 S S 0 0 S 0 S S
  49. 49. Strategy iteration: Iteration 1 0 1 0.66667 The numbers on the edges are the probability that the edge is used. Edges without a number have probability 0.33333 to be used. 0.66667 0.66667 0.66667 0.66667 0.66667 0.66667 0.66667 <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>26/42
  50. 50. Strategy iteration: Iteration 1 <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  51. 51. Strategy iteration: Iteration 1 0.11111 0.03704 0.01235 0.33333 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  52. 52. Strategy iteration: Iteration 1 0.11111 0.03704 0.01235 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.01235 0 0 0 S 1 1 1 <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.01235 0.01235 0.01235 0.33748 26/42 1 0 0 S 1 0 S S 1 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  53. 53. Strategy iteration: Iteration 1 S 0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  54. 54. Strategy iteration: Iteration 1 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  55. 55. Strategy iteration: Iteration 2 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  56. 56. Strategy iteration: Iteration 2 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  57. 57. Strategy iteration: Iteration 2 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  58. 58. Strategy iteration: Iteration 2 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.11677 0.04359 0.02065 0.33748 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  59. 59. Strategy iteration: Iteration 2 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  60. 60. Strategy iteration: Iteration 3 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  61. 61. Strategy iteration: Iteration 3 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  62. 62. Strategy iteration: Iteration 3 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  63. 63. Strategy iteration: Iteration 3 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12067 0.04825 0.02676 0.34031 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  64. 64. Strategy iteration: Iteration 3 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12067 0.04825 0.02676 0.34031 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  65. 65. Strategy iteration: Iteration 3 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  66. 66. Strategy iteration: Iteration 4 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  67. 67. Strategy iteration: Iteration 4 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  68. 68. Strategy iteration: Iteration 4 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  69. 69. Strategy iteration: Iteration 4 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  70. 70. Strategy iteration: Iteration 4 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12360 0.05185 0.03154 0.34241 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  71. 71. Strategy iteration: Iteration 4 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12360 0.05185 0.03154 0.34241 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  72. 72. Strategy iteration: Iteration 4 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  73. 73. Strategy iteration: Iteration 5 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  74. 74. Strategy iteration: Iteration 5 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  75. 75. Strategy iteration: Iteration 5 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  76. 76. Strategy iteration: Iteration 5 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12593 0.05476 0.03544 0.34407 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  77. 77. Strategy iteration: Iteration 5 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  78. 78. Strategy iteration: Iteration 6 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  79. 79. Strategy iteration: Iteration 6 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  80. 80. Strategy iteration: Iteration 6 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  81. 81. Strategy iteration: Iteration 6 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12786 0.05721 0.03873 0.34543 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  82. 82. Strategy iteration: Iteration 6 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12786 0.05721 0.03873 0.34543 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  83. 83. Strategy iteration: Iteration 6 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  84. 84. Strategy iteration: Iteration 6 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  85. 85. Strategy iteration: Iteration 7 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  86. 86. Strategy iteration: Iteration 7 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  87. 87. Strategy iteration: Iteration 7 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12950 0.05932 0.04156 0.34658 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  88. 88. Strategy iteration: Iteration 7 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12950 0.05932 0.04156 0.34658 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  89. 89. Strategy iteration: Iteration 7 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  90. 90. Strategy iteration: Iteration 8 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  91. 91. Strategy iteration: Iteration 8 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  92. 92. Strategy iteration: Iteration 8 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  93. 93. Strategy iteration: Iteration 8 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.13093 0.06118 0.04404 0.34758 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  94. 94. Strategy iteration: Iteration 8 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  95. 95. Strategy iteration: Iteration 9 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  96. 96. Strategy iteration: Iteration 9 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  97. 97. Strategy iteration: Iteration 9 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  98. 98. Strategy iteration: Iteration 9 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.13219 0.06283 0.04624 0.34845 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  99. 99. Strategy iteration: Iteration 9 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.13219 0.06283 0.04624 0.34845 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  100. 100. Strategy iteration: Iteration 9 S <ul><li>Best response for Lucifer </li></ul><ul><li>Calculate values from those strategies </li></ul><ul><li>Update strategy for Dante </li></ul>0.13219 0.06283 0.04624 0.34845 0.34923 0.33309 0.31768 0.38176 0.33109 0.28715 0.48241 0.31366 0.20393 0.74985 0.19791 0.05224 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  101. 101. Generalized Purgatory P(N,m) <ul><li>Lucifer repeatedly hides a number between 1 and m. </li></ul><ul><li>Dante must try to guess the number. </li></ul><ul><li>If he guesses correctly N times in a row, he goes to heaven. </li></ul><ul><li>If he ever guesses incorrectly overshooting Lucifer’s number, he goes to hell. </li></ul>35/42
  102. 102. Interesting fact <ul><li>The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough. </li></ul>36/42
  103. 103. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices 37/42 1 0 1 0 0 1 0 1 1 0 1
  104. 104. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix t:=0 Strategy iteration on 3 matrices 37/42 1 0 1 0 0 1 0 1 1 0 1
  105. 105. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 37/42 1 0 1 0 0 1 0 1 1 0 1
  106. 106. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 38/42 1 0 1 0 0 1 0 1 1 0 1
  107. 107. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 38/42 1 0 1 0 0 1 0 1 1 0 1
  108. 108. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.25 0.125 38/42 1 0 1 0 0 1 0 1 1 0 1
  109. 109. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 38/42 1 0 1 1 0 1 0 0 1 0 1
  110. 110. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 39/42 1 0 1 1 0 1 0 0 1 0 1
  111. 111. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 39/42 1 0 1 1 0 1 0 0 1 0 1
  112. 112. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.66667 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.66667 0.53333 0.30476 0.20317 39/42 1 0 1 1 0 1 0 0 1 0 1
  113. 113. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 39/42 1 0 1 1 0 1 0 0 1 0 1
  114. 114. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 40/42 1 0 1 1 0 1 0 0 1 0 1
  115. 115. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 40/42 1 0 1 1 0 1 0 0 1 0 1
  116. 116. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.75000 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.75000 0.55654 0.34374 0.25781 40/42 1 0 1 1 0 1 0 0 1 0 1
  117. 117. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.75000 0.80000 0.20000 0.80000 0.20000 0.65072 0.34928 0.57399 0.42601 0.75000 0.55654 0.34374 0.25781 41/42 1 0 1 1 0 1 0 0 1 0 1
  118. 118. The end <ul><li>Open problems: </li></ul><ul><li>Find a fast algorithm for the problem </li></ul><ul><ul><li>There exists a PSPACE algorithm for the problem, but it is not fast. </li></ul></ul><ul><li>Thanks for listening </li></ul>42/42

×