• Like
Csr2011 june14 16_30_ibsen-jensen
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Csr2011 june14 16_30_ibsen-jensen

  • 136 views
Published

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
136
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The complexity of solving reachability games using value and strategy iteration Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Peter Bro Miltersen Aarhus University Denmark CSR 2011, 14’th June
  • 2. Overview
    • What are concurrent reachabillity games?
    • Two standard algorithms solving concurrent reachabillity games:
      • The value iteration algorithm
      • The strategy iteration algorithm
    • Examplify important facts for the proof of the time lower bound for both algorithms
    1/42
  • 3. Matrix games von Neumann 1928 2/42 0 1 -1 -1 0 1 1 -1 0
  • 4. Matrix games von Neumann 1928 2/42 0 1 -1 -1 0 1 1 -1 0
  • 5. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer vs. Dante* Lucifer* 0 1 * Naming convention from Hansen, Koucky and Miltersen, 2009 3/42 0 1 -1 -1 0 1 1 -1 0
  • 6. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 vs. Dante* Lucifer* Each entry can be either 0, 1 or a pointer * Naming convention from Hansen, Koucky and Miltersen, 2009 3/42
  • 7. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer 3/42
  • 8. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0 , 1 or a pointer 3/42 0 0 0 0 0 0 0 0 0 0 0 0
  • 9. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer 3/42 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0
  • 10. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer S: 3/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 11. Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer S: 3/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 12. Histories Each entry can be either 0, 1 or a pointer S: 4/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 13. Histories and strategies
    • History: Sequence of positions and choices for each player in each position.
    • Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history
    • S 1 : Set of strategies for Dante
    • S 2 : Set of strategies for Lucifer
    • H 1 /H 2 : Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history)
    5/42
  • 14. Payoffs
    • v(i, σ , π ) : T he probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π .
    6/42
  • 15. Everett 1957 Value of i 7/42
  • 16. Algorithmic problems
    • Quantitatively solving a game: Given the game, compute the value of all positions.
    • Strategically solving a game: Given the game and ε >0 , compute σ such that for all π and i: v(i, σ , π )>v i - ε .
    8/42
  • 17. Value iteration Shapley 1953 9/42
    • Value iteration computes the value of each position in G t in iteration t , on the basis of the value of each position in G t -1 .
    • G t : A modified version of G, where Dante loses after t moves.
  • 18. Our results: Lower bound for value iteration
    • There exists a concurrent reachabillity game G, with N matrices and m rows and columns in each matrix, so that:
    • val(G)=1 and
    • val(G t ) = 3 m - N /2 , for t=2 m N /2
    10/42
  • 19. Our results: Upper bound for value iteration
    • For any concurrent reachabillity game G
    • val(G)-val(G t )< ε for t=(1/ ε ) m O(N)
    11/42
  • 20. Value iteration example – G 0 S: 12/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 21. Value iteration example – G 0 S: 0 0 0 0 12/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 22. Value iteration example – G 1 S: 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 23. Value iteration example – G 1 S: 0 0 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 24. Value iteration example – G 1 S: 0 0 0 0 1 1 1 1 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0
  • 25. Value iteration example – G 1 S: 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 1 0 1
  • 26. Value iteration example – G 1 S: 0 0 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 1 0 1
  • 27. Value iteration example – G 1 0 S: 0.33333/ 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 0 1 0 0 0 1
  • 28. Value iteration example – G 1 S: 0 0 0.33333/ 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 29. Value iteration example – G 1 S: 0 0 0 0 0 0 0 0 0 0.33333/ 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0
  • 30. Value iteration example – G 1 S: 0 0.33333/ 0 0 0/ 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0 0 0 0 0 0 0
  • 31. Value iteration example – G 1 S: 0 0 0 0.33333/ 0 0/ 0/ 0/ 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 32. Value iteration example – G 2 S: 0 0 0 0.33333/ 0.33333 0.11111/ 0/ 0/ 14/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 33. Value iteration example – G 3 S: 0.11111 0 0 0.33333/ 0.33333 0.11111/ 0/ 0.03704/ 15/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 34. Value iteration example – G 4 S: 0.11111 0.03704 0 0.33333/ 0.33333 0.11111/ 0.01235/ 0.03704/ 16/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 35. Value iteration example – G 5 S: 0.11111 0.03704 0.01235 0.33748/ 0.33333 0.11533/ 0.01754/ 0.04147/ 17/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 36. Value iteration example – G 6 S: 0.11533 0.04147 0.01754 0.33925/ 0.33748 0.11855/ 0.02172/ 0.04493/ 18/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 37. Value iteration example – G 7 S: 0.11855 0.04493 0.02172 0.34068/ 0.33925 0.12064/ 0.02519/ 0.04772/ 19/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 38. Value iteration example – G 8 S: 0.12064 0.04772 0.02519 0.34187/ 0.34068 0.12388/ 0.02815/ 0.04991/ 20/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 39. Value iteration example – G 9 S: 0.12388 0.04991 0.02815 0.34378/ 0.34187 0.12517/ 0.03070/ 0.05129 / 21/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • 40. Strategy iteration Chatterjee, de Alfaro, Henzinger ’06 22/42 Was conjectured to be fast
  • 41. Our results: Upper bound for strategy iteration
    • An ε -optimal strategy is computed after t=(1/ ε ) m O(N) iterations of strategy iteration
    • This follows from the corresponding results for value iteration
    23/42
  • 42. Our results: Lower bound for strategy iteration
    • There exists a concurrent reachabillity game G, with N matrices, for large N , and m rows and columns in each matrix, so that:
    • val(G)=1 and
    • The strategy optained by strategy iteration guarantees winning probability at most 4 m - N /2 , for t= 2 m N /4
    24/42 Strategy iteration, m=2 18446744073709551617 7 340282366920938463463374607431768211457 8 115792089237316195423570985008687907853269984665640564039457584007913129639937 9 Number of iterations needed to get over 1/2 N
  • 43. Strategy iteration: Before iteration 1 S:
    • Start strategy for Dante:= Uniform
    25/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 44. Strategy iteration: Before iteration 1 S
    • Start strategy for Dante:= Uniform
    0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 25/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 45. Strategy iteration: Iteration 1
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 46. Strategy iteration: Iteration 1
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 47. Strategy iteration: Iteration 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 48. Strategy iteration: Iteration 1 1 0.66667 The numbers on the edges are the probability that the edge is used. Edges without a number have probability 0.33333 to be used.
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 S 0 0 0 S 0 S S 0 0 S 0 S S 0 0 S S 0 0 S 0 S S
  • 49. Strategy iteration: Iteration 1 0 1 0.66667 The numbers on the edges are the probability that the edge is used. Edges without a number have probability 0.33333 to be used. 0.66667 0.66667 0.66667 0.66667 0.66667 0.66667 0.66667
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    26/42
  • 50. Strategy iteration: Iteration 1
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 51. Strategy iteration: Iteration 1 0.11111 0.03704 0.01235 0.33333 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 52. Strategy iteration: Iteration 1 0.11111 0.03704 0.01235 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.01235 0 0 0 S 1 1 1
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.01235 0.01235 0.01235 0.33748 26/42 1 0 0 S 1 0 S S 1 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 53. Strategy iteration: Iteration 1 S 0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 54. Strategy iteration: Iteration 1 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 55. Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 56. Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 57. Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 58. Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 59. Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 60. Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 61. Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 62. Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 63. Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 64. Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 65. Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 66. Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 67. Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 68. Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 69. Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 70. Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 71. Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 72. Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 73. Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 74. Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 75. Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 76. Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 77. Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 78. Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 79. Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 80. Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 81. Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 82. Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 83. Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 84. Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 85. Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 86. Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 87. Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 88. Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 89. Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 90. Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 91. Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 92. Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 93. Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 94. Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 95. Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 96. Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 97. Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 98. Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13219 0.06283 0.04624 0.34845 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 99. Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13219 0.06283 0.04624 0.34845 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 100. Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13219 0.06283 0.04624 0.34845 0.34923 0.33309 0.31768 0.38176 0.33109 0.28715 0.48241 0.31366 0.20393 0.74985 0.19791 0.05224 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • 101. Generalized Purgatory P(N,m)
    • Lucifer repeatedly hides a number between 1 and m.
    • Dante must try to guess the number.
    • If he guesses correctly N times in a row, he goes to heaven.
    • If he ever guesses incorrectly overshooting Lucifer’s number, he goes to hell.
    35/42
  • 102. Interesting fact
    • The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough.
    36/42
  • 103. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices 37/42 1 0 1 0 0 1 0 1 1 0 1
  • 104. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix t:=0 Strategy iteration on 3 matrices 37/42 1 0 1 0 0 1 0 1 1 0 1
  • 105. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 37/42 1 0 1 0 0 1 0 1 1 0 1
  • 106. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 38/42 1 0 1 0 0 1 0 1 1 0 1
  • 107. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 38/42 1 0 1 0 0 1 0 1 1 0 1
  • 108. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.25 0.125 38/42 1 0 1 0 0 1 0 1 1 0 1
  • 109. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 38/42 1 0 1 1 0 1 0 0 1 0 1
  • 110. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 39/42 1 0 1 1 0 1 0 0 1 0 1
  • 111. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 39/42 1 0 1 1 0 1 0 0 1 0 1
  • 112. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.66667 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.66667 0.53333 0.30476 0.20317 39/42 1 0 1 1 0 1 0 0 1 0 1
  • 113. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 39/42 1 0 1 1 0 1 0 0 1 0 1
  • 114. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 40/42 1 0 1 1 0 1 0 0 1 0 1
  • 115. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 40/42 1 0 1 1 0 1 0 0 1 0 1
  • 116. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.75000 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.75000 0.55654 0.34374 0.25781 40/42 1 0 1 1 0 1 0 0 1 0 1
  • 117. Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.75000 0.80000 0.20000 0.80000 0.20000 0.65072 0.34928 0.57399 0.42601 0.75000 0.55654 0.34374 0.25781 41/42 1 0 1 1 0 1 0 0 1 0 1
  • 118. The end
    • Open problems:
    • Find a fast algorithm for the problem
      • There exists a PSPACE algorithm for the problem, but it is not fast.
    • Thanks for listening
    42/42