• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Csr2011 june14 16_30_ibsen-jensen
 

Csr2011 june14 16_30_ibsen-jensen

on

  • 220 views

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration

Statistics

Views

Total Views
220
Views on SlideShare
208
Embed Views
12

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 12

http://logic.pdmi.ras.ru 12

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Csr2011 june14 16_30_ibsen-jensen Csr2011 june14 16_30_ibsen-jensen Presentation Transcript

    • The complexity of solving reachability games using value and strategy iteration Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Peter Bro Miltersen Aarhus University Denmark CSR 2011, 14’th June
    • Overview
      • What are concurrent reachabillity games?
      • Two standard algorithms solving concurrent reachabillity games:
        • The value iteration algorithm
        • The strategy iteration algorithm
      • Examplify important facts for the proof of the time lower bound for both algorithms
      1/42
    • Matrix games von Neumann 1928 2/42 0 1 -1 -1 0 1 1 -1 0
    • Matrix games von Neumann 1928 2/42 0 1 -1 -1 0 1 1 -1 0
    • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer vs. Dante* Lucifer* 0 1 * Naming convention from Hansen, Koucky and Miltersen, 2009 3/42 0 1 -1 -1 0 1 1 -1 0
    • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 vs. Dante* Lucifer* Each entry can be either 0, 1 or a pointer * Naming convention from Hansen, Koucky and Miltersen, 2009 3/42
    • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer 3/42
    • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0 , 1 or a pointer 3/42 0 0 0 0 0 0 0 0 0 0 0 0
    • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer 3/42 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0
    • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer S: 3/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998 Each entry can be either 0, 1 or a pointer S: 3/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Histories Each entry can be either 0, 1 or a pointer S: 4/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Histories and strategies
      • History: Sequence of positions and choices for each player in each position.
      • Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history
      • S 1 : Set of strategies for Dante
      • S 2 : Set of strategies for Lucifer
      • H 1 /H 2 : Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history)
      5/42
    • Payoffs
      • v(i, σ , π ) : T he probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π .
      6/42
    • Everett 1957 Value of i 7/42
    • Algorithmic problems
      • Quantitatively solving a game: Given the game, compute the value of all positions.
      • Strategically solving a game: Given the game and ε >0 , compute σ such that for all π and i: v(i, σ , π )>v i - ε .
      8/42
    • Value iteration Shapley 1953 9/42
      • Value iteration computes the value of each position in G t in iteration t , on the basis of the value of each position in G t -1 .
      • G t : A modified version of G, where Dante loses after t moves.
    • Our results: Lower bound for value iteration
      • There exists a concurrent reachabillity game G, with N matrices and m rows and columns in each matrix, so that:
      • val(G)=1 and
      • val(G t ) = 3 m - N /2 , for t=2 m N /2
      10/42
    • Our results: Upper bound for value iteration
      • For any concurrent reachabillity game G
      • val(G)-val(G t )< ε for t=(1/ ε ) m O(N)
      11/42
    • Value iteration example – G 0 S: 12/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Value iteration example – G 0 S: 0 0 0 0 12/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Value iteration example – G 1 S: 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 1 S: 0 0 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 1 S: 0 0 0 0 1 1 1 1 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0
    • Value iteration example – G 1 S: 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 1 0 1
    • Value iteration example – G 1 S: 0 0 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 1 0 1
    • Value iteration example – G 1 0 S: 0.33333/ 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 1 0 0 0 1 0 0 0 1
    • Value iteration example – G 1 S: 0 0 0.33333/ 0 0 0 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 1 S: 0 0 0 0 0 0 0 0 0 0.33333/ 0 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0
    • Value iteration example – G 1 S: 0 0.33333/ 0 0 0/ 0 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1 0 0 0 0 0 0 0 0 0
    • Value iteration example – G 1 S: 0 0 0 0.33333/ 0 0/ 0/ 0/ 13/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 2 S: 0 0 0 0.33333/ 0.33333 0.11111/ 0/ 0/ 14/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 3 S: 0.11111 0 0 0.33333/ 0.33333 0.11111/ 0/ 0.03704/ 15/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 4 S: 0.11111 0.03704 0 0.33333/ 0.33333 0.11111/ 0.01235/ 0.03704/ 16/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 5 S: 0.11111 0.03704 0.01235 0.33748/ 0.33333 0.11533/ 0.01754/ 0.04147/ 17/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 6 S: 0.11533 0.04147 0.01754 0.33925/ 0.33748 0.11855/ 0.02172/ 0.04493/ 18/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 7 S: 0.11855 0.04493 0.02172 0.34068/ 0.33925 0.12064/ 0.02519/ 0.04772/ 19/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 8 S: 0.12064 0.04772 0.02519 0.34187/ 0.34068 0.12388/ 0.02815/ 0.04991/ 20/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Value iteration example – G 9 S: 0.12388 0.04991 0.02815 0.34378/ 0.34187 0.12517/ 0.03070/ 0.05129 / 21/42 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S 1 0 0 S 1 0 S S 1
    • Strategy iteration Chatterjee, de Alfaro, Henzinger ’06 22/42 Was conjectured to be fast
    • Our results: Upper bound for strategy iteration
      • An ε -optimal strategy is computed after t=(1/ ε ) m O(N) iterations of strategy iteration
      • This follows from the corresponding results for value iteration
      23/42
    • Our results: Lower bound for strategy iteration
      • There exists a concurrent reachabillity game G, with N matrices, for large N , and m rows and columns in each matrix, so that:
      • val(G)=1 and
      • The strategy optained by strategy iteration guarantees winning probability at most 4 m - N /2 , for t= 2 m N /4
      24/42 Strategy iteration, m=2 18446744073709551617 7 340282366920938463463374607431768211457 8 115792089237316195423570985008687907853269984665640564039457584007913129639937 9 Number of iterations needed to get over 1/2 N
    • Strategy iteration: Before iteration 1 S:
      • Start strategy for Dante:= Uniform
      25/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Before iteration 1 S
      • Start strategy for Dante:= Uniform
      0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 25/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 1
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 1
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 1 1 0.66667 The numbers on the edges are the probability that the edge is used. Edges without a number have probability 0.33333 to be used.
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      S 0 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 S 0 0 0 S 0 S S 0 0 S 0 S S 0 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 1 0 1 0.66667 The numbers on the edges are the probability that the edge is used. Edges without a number have probability 0.33333 to be used. 0.66667 0.66667 0.66667 0.66667 0.66667 0.66667 0.66667
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      26/42
    • Strategy iteration: Iteration 1
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      S 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 1 0.11111 0.03704 0.01235 0.33333 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 1 0.11111 0.03704 0.01235 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.01235 0 0 0 S 1 1 1
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.01235 0.01235 0.01235 0.33748 26/42 1 0 0 S 1 0 S S 1 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 1 S 0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 0.33333 26/42
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 1 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 26/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 2 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 2 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 2 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.11111 0.03704 0.01235 0.33333 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 2 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.11677 0.04359 0.02065 0.33748 0.33748 0.33332 0.32920 0.34599 0.33317 0.32084 0.37327 0.33180 0.29493 0.47368 0.31579 0.21053 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 2 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 27/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 3 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 3 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 3 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.11677 0.04359 0.02065 0.33748 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 3 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12067 0.04825 0.02676 0.34031 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 3 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12067 0.04825 0.02676 0.34031 0.34031 0.33329 0.32640 0.35458 0.33289 0.31253 0.39987 0.33180 0.32917 0.55453 0.29186 0.15361 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 3 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 28/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 4 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 4 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 4 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 4 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12067 0.04825 0.02676 0.34031 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 4 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12360 0.05185 0.03154 0.34241 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 4 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12360 0.05185 0.03154 0.34241 0.34241 0.33325 0.32434 0.36097 0.33259 0.30644 0.41947 0.32646 0.25407 0.60831 0.27098 0.12071 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 4 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 29/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 5 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 5 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 5 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12360 0.05185 0.03154 0.34241 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 5 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12593 0.05476 0.03544 0.34407 0.34407 0.33322 0.32271 0.36601 0.33230 0.30169 0.43486 0.32390 0.24125 0.64720 0.25350 0.09930 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 5 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 30/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 6 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 6 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 6 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12593 0.05476 0.03544 0.34407 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 6 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12786 0.05721 0.03873 0.34543 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 6 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12786 0.05721 0.03873 0.34543 0.34543 0.33319 0.32138 0.37015 0.33202 0.29783 0.44745 0.32152 0.23103 0.67692 0.23882 0.08426 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 6 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 6 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 31/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 7 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 7 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12786 0.05721 0.03873 0.34543 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 7 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12950 0.05932 0.04156 0.34658 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 7 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12950 0.05932 0.04156 0.34658 0.34658 0.33316 0.32026 0.37366 0.33177 0.29457 0.45807 0.31933 0.22260 0.70055 0.22633 0.07312 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 7 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 32/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 8 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 8 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 8 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.12950 0.05932 0.04156 0.34658 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 8 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.13093 0.06118 0.04404 0.34758 0.34758 0.33313 0.31929 0.37670 0.33153 0.29177 0.46723 0.31730 0.21547 0.71988 0.21557 0.06455 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 8 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 33/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 9 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 9 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 9 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.13093 0.06118 0.04404 0.34758 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 9 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.13219 0.06283 0.04624 0.34845 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 9 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.13219 0.06283 0.04624 0.34845 0.34845 0.33311 0.31844 0.37937 0.33130 0.28933 0.47527 0.31541 0.20932 0.73606 0.20618 0.05776 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Strategy iteration: Iteration 9 S
      • Best response for Lucifer
      • Calculate values from those strategies
      • Update strategy for Dante
      0.13219 0.06283 0.04624 0.34845 0.34923 0.33309 0.31768 0.38176 0.33109 0.28715 0.48241 0.31366 0.20393 0.74985 0.19791 0.05224 34/42 1 0 0 S 1 0 S S 1 0 0 S 0 S S 0 0 S 0 S S 0 0 S 0 S S
    • Generalized Purgatory P(N,m)
      • Lucifer repeatedly hides a number between 1 and m.
      • Dante must try to guess the number.
      • If he guesses correctly N times in a row, he goes to heaven.
      • If he ever guesses incorrectly overshooting Lucifer’s number, he goes to hell.
      35/42
    • Interesting fact
      • The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough.
      36/42
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices 37/42 1 0 1 0 0 1 0 1 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix t:=0 Strategy iteration on 3 matrices 37/42 1 0 1 0 0 1 0 1 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 37/42 1 0 1 0 0 1 0 1 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 38/42 1 0 1 0 0 1 0 1 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 38/42 1 0 1 0 0 1 0 1 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.25 0.125 38/42 1 0 1 0 0 1 0 1 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=1 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 38/42 1 0 1 1 0 1 0 0 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 39/42 1 0 1 1 0 1 0 0 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.5 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.5 0.5 0.25 0.125 39/42 1 0 1 1 0 1 0 0 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.66667 0.66667 0.33333 0.66667 0.33333 0.57143 0.42857 0.53333 0.46667 0.66667 0.53333 0.30476 0.20317 39/42 1 0 1 1 0 1 0 0 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=2 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 39/42 1 0 1 1 0 1 0 0 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 40/42 1 0 1 1 0 1 0 0 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.66667 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.66667 0.53333 0.30476 0.20317 40/42 1 0 1 1 0 1 0 0 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.75000 0.75000 0.25000 0.75000 0.25000 0.61765 0.38235 0.55654 0.44346 0.75000 0.55654 0.34374 0.25781 40/42 1 0 1 1 0 1 0 0 1 0 1
    • Exemplifying important facts Value iteration on 1 matrix Strategy iteration on 1 matrix Strategy iteration on 3 matrices t:=3 0.75000 0.80000 0.20000 0.80000 0.20000 0.65072 0.34928 0.57399 0.42601 0.75000 0.55654 0.34374 0.25781 41/42 1 0 1 1 0 1 0 0 1 0 1
    • The end
      • Open problems:
      • Find a fast algorithm for the problem
        • There exists a PSPACE algorithm for the problem, but it is not fast.
      • Thanks for listening
      42/42