Csr2011 june14 16_30_ibsen-jensen
Upcoming SlideShare
Loading in...5
×
 

Csr2011 june14 16_30_ibsen-jensen

on

  • 247 views

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration

Statistics

Views

Total Views
247
Views on SlideShare
235
Embed Views
12

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 12

http://logic.pdmi.ras.ru 12

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Csr2011 june14 16_30_ibsen-jensen Csr2011 june14 16_30_ibsen-jensen Presentation Transcript

  • The complexity of solving reachability games using value and strategy iteration Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Peter Bro Miltersen Aarhus University Denmark CSR 2011, 14’th June
  • Overview
    • What are concurrent reachabillity games?
    • Two standard algorithms solving concurrent reachabillity games:
      • The value iteration algorithm
      • The strategy iteration algorithm
    • Examplify important facts for the proof of the time lower bound for both algorithms
    1/42
  • Matrix games von Neumann 1928 2/42 0 1 -1 -1 0 1 1 -1 0
  • Matrix games von Neumann 1928 2/42 0 1 -1 -1 0 1 1 -1 0
  • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer vs. Dante* Lucifer* 0 1 * Naming convention from Hansen, Koucky and Miltersen, 2009 3/42 0 1 -1 -1 0 1 1 -1 0
  • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 vs. Dante* Lucifer* Each entry can be either 0, 1 or a pointer * Naming convention from Hansen, Koucky and Miltersen, 2009 3/42
  • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer 3/42
  • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0 , 1 or a pointer 3/42 0 0 0 0 0 0 0 0 0 0 0 0
  • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer 3/42 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0
  • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer S: 3/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer S: 3/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Histories Each entry can be either 0, 1 or a pointer S: 4/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Histories and strategies
    • History: Sequence of positions and choices for each player in each position.
    • Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history
    • S 1 : Set of strategies for Dante
    • S 2 : Set of strategies for Lucifer
    • H 1 /H 2 : Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history)
    5/42
  • Payoffs
    • v(i, σ , π ) : T he probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π .
    6/42
  • Everett 1957 Value of i 7/42
  • Algorithmic problems
    • Quantitatively solving a game: Given the game, compute the value of all positions.
    • Strategically solving a game: Given the game and ε >0 , compute σ such that for all π and i: v(i, σ , π )>v i - ε .
    8/42
  • Value iteration Shapley 1953 9/42
    • Value iteration computes the value of each position in G t in iteration t , on the basis of the value of each position in G t -1 .
    • G t : A modified version of G, where Dante loses after t moves.
  • Our results: Lower bound for value iteration
    • There exists a concurrent reachabillity game G, with N matrices and m rows and columns in each matrix, so that:
    • val(G)=1 and
    • val(G t ) = 3 m - N /2 , for t=2 m N /2
    10/42
  • Our results: Upper bound for value iteration
    • For any concurrent reachabillity game G
    • val(G)-val(G t )< ε for t=(1/ ε ) m O(N)
    11/42
  • Value iteration example – G 0 S: 12/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Value iteration example – G 0 S: 0 0 0 0 12/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Value iteration example – G 1 S: 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 1 S: 0 0 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 1 S: 0 0 0 0 1 1 1 1 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0
  • Value iteration example – G 1 S: 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 1 0 1
  • Value iteration example – G 1 S: 0 0 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 1 0 1
  • Value iteration example – G 1 0 S: 0.33333/ 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 0 1 0 0 0 1
  • Value iteration example – G 1 S: 0 0 0.33333/ 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 1 S: 0 0 0 0 0 0 0 0 0 0.33333/ 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0
  • Value iteration example – G 1 S: 0 0.33333/ 0 0 0/ 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0 0 0 0 0 0 0
  • Value iteration example – G 1 S: 0 0 0 0.33333/ 0 0/ 0/ 0/ 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 2 S: 0 0 0 0.33333/ 0.33333 0.11111/ 0/ 0/ 14/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 3 S: 0.11111 0 0 0.33333/ 0.33333 0.11111/ 0/ 0.03704/ 15/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 4 S: 0.11111 0.03704 0 0.33333/ 0.33333 0.11111/ 0.01235/ 0.03704/ 16/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 5 S: 0.11111 0.03704 0.01235 0.33748/ 0.33333 0.11533/ 0.01754/ 0.04147/ 17/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 6 S: 0.11533 0.04147 0.01754 0.33925/ 0.33748 0.11855/ 0.02172/ 0.04493/ 18/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 7 S: 0.11855 0.04493 0.02172 0.34068/ 0.33925 0.12064/ 0.02519/ 0.04772/ 19/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 8 S: 0.12064 0.04772 0.02519 0.34187/ 0.34068 0.12388/ 0.02815/ 0.04991/ 20/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Value iteration example – G 9 S: 0.12388 0.04991 0.02815 0.34378/ 0.34187 0.12517/ 0.03070/ 0.05129 / 21/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
  • Strategy iteration Chatterjee, de Alfaro, Henzinger ’06 22/42 Was conjectured to be fast
  • Our results: Upper bound for strategy iteration
    • An ε -optimal strategy is computed after t=(1/ ε ) m O(N) iterations of strategy iteration
    • This follows from the corresponding results for value iteration
    23/42
  • Our results: Lower bound for strategy iteration
    • There exists a concurrent reachabillity game G, with N matrices, for large N , and m rows and columns in each matrix, so that:
    • val(G)=1 and
    • The strategy optained by strategy iteration guarantees winning probability at most 4 m - N /2 , for t= 2 m N /4
    24/42 Strategy iteration, m=2 18446744073709551617 7 340282366920938463463374607431768211457 8 115792089237316195423570985008687907853269984665640564039457584007913129639937 9 Number of iterations needed to get over 1/2 N
  • Strategy iteration: Before iteration 1 S:
    • Start strategy for Dante:= Uniform
    25/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Before iteration 1 S
    • Start strategy for Dante:= Uniform
    0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 25/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 1
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 1
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 1 1 0.66667 The numbers on the edges are the probability that the edge is used. Edges without a number have probability 0.33333 to be used.
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 S 0 0 0 S 0 S S 0 0 S 0 S S 0 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 1 0 1 0.66667 The numbers on the edges are the probability that the edge is used. Edges without a number have probability 0.33333 to be used. 0.66667 0.66667 0.66667 0.66667 0.66667 0.66667 0.66667
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    26/42
  • Strategy iteration: Iteration 1
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 1 0.11111 0.03704 0.01235 0.33333 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 1 0.11111 0.03704 0.01235 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.01235 0 0 0 S 1 1 1
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.01235 0.01235 0.01235 0.33748 26/42 1 0 0 S 1 0 S S 1 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 1 S 0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 1 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 2 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 3 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 4 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 5 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 6 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 7 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 8 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13219 0.06283 0.04624 0.34845 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13219 0.06283 0.04624 0.34845 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Strategy iteration: Iteration 9 S
    • Best response for Lucifer
    • Calculate values from those strategies
    • Update strategy for Dante
    0.13219 0.06283 0.04624 0.34845 0.34923 0.33309 0.31768 0.38176 0.33109 0.28715 0.48241 0.31366 0.20393 0.74985 0.19791 0.05224 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
  • Generalized Purgatory P(N,m)
    • Lucifer repeatedly hides a number between 1 and m.
    • Dante must try to guess the number.
    • If he guesses correctly N times in a row, he goes to heaven.
    • If he ever guesses incorrectly overshooting Lucifer’s number, he goes to hell.
    35/42
  • Interesting fact
    • The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough.
    36/42
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices 37/42 1 0 1 0 0 1 0 1 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix t:=0 Strategy iteration on 3 matrices 37/42 1 0 1 0 0 1 0 1 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 37/42 1 0 1 0 0 1 0 1 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 38/42 1 0 1 0 0 1 0 1 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 38/42 1 0 1 0 0 1 0 1 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.25 0.125 38/42 1 0 1 0 0 1 0 1 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 38/42 1 0 1 1 0 1 0 0 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 39/42 1 0 1 1 0 1 0 0 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 39/42 1 0 1 1 0 1 0 0 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.66667 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.66667 0.53333 0.30476 0.20317 39/42 1 0 1 1 0 1 0 0 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 39/42 1 0 1 1 0 1 0 0 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 40/42 1 0 1 1 0 1 0 0 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 40/42 1 0 1 1 0 1 0 0 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.75000 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.75000 0.55654 0.34374 0.25781 40/42 1 0 1 1 0 1 0 0 1 0 1
  • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.75000 0.80000 0.20000 0.80000 0.20000 0.65072 0.34928 0.57399 0.42601 0.75000 0.55654 0.34374 0.25781 41/42 1 0 1 1 0 1 0 0 1 0 1
  • The end
    • Open problems:
    • Find a fast algorithm for the problem
      • There exists a PSPACE algorithm for the problem, but it is not fast.
    • Thanks for listening
    42/42