Biology for Computer Engineers Course Handout.pptx
1532 0545-2001-02-01-0050
1. TRICK
Building a Better Game through Dynamic Programming: A Flip Analysis
Building a Better Game through Dynamic
Programming: A Flip Analysis
Michael A. Trick
Graduate School of Industrial Administration
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213, USA
trick@cmu.edu
Abstract
Flip is a solitaire board game produced by craft
woodworkers. We analyze Flip and suggest modifi-cations
to the rules to make the game more market-able.
In addition to being an interesting application
of dynamic programming, this case shows the use
of operations research in managerial decision
making.
________________________________________
Flip and Flip-Flop
Before reading more about Flip, I invite you to play
a few games. The rules are very simple: you begin
with the number 1 through 12 face up. Every turn,
two dice will be rolled. You choose values whose
sum equals the roll (simply click on them, and
“accept” the choices; if you make a mistake, click
on a tile a second time to cancel your choice) and
turn them face down. You continue until either all
values are face down (you win!) or there is no com-bination
of of face-up values that equal the roll (you
lose). As you play this game, try to figure out an
optimal strategy, and estimate the chance of
winning Flip.
Now, try your hand at Flip-Flop. Here the goal is to
get all of the red tiles flipped over to their green
sides. A tile can be flipped (red to green) or flopped
(green to red) as long as the total changed matches
the roll of the dice. How long do you think it will
take (on average) to flip from red to green?
1. Introduction
In the past, I have found dynamic programming a
difficult topic to cover in my MBA level operations
research courses. While applications of
dynamic programming abound, many of them
degenerate into a flurry of notation when addressed
in class. I needed an example of a problem that
was difficult enough that the use of a formal
quantitative process was justified, but simple
enough so that the class would not be overwhelmed.
In the following, I present an analysis of Flip, a
solitaire board game produced by many craft
wordworkers and widely available at craft shows.
The model is relatively complicated, and beyond
easy formulation in a spreadsheet. It is simple
enough, however, that students with knowledge of
a programming language can do this analysis as a
term project or can modify a working program to
analyze variants easily.
One difficulty with using games as an example in
class is that they lack realism to students, particu-larly
those seeking an MBA. After all, we are not
training students to better spend their recreational
time! I therefore present this problem as I, in fact,
received it: as a question of game design and evalu-ation
to enhance marketing. This hits much closer
to home with most students. The result is an
example of not just how to do Operations Research,
but why.
2. Problem Description
G&E Woodworking* is a small-scale craft shop in
Canada that produces a number of wood items for
sale at craft shows. One partner (G) does the
design, cutting, and assembly of the items, while
50
INFORMS Transcations on Education 2:1 (50-58) ÓINFORMS
2. TRICK
Building a Better Game through Dynamic Programming: A Flip Analysis
the other (E) is responsible for staining and final
presentation. G&E sells only during the fall craft
season leading up to Christmas. Every spring G&E
chooses a number of products for sale in the
upcoming year. The summer is spent producing the
items by hand for sale in the fall. No more than 5 or
6 different items are produced each year due to
production line limitations.
In the Spring of 1999, G&E considered producing a
solitaire board game which I will call Flip. Flip met
the basic requirements of G&E: it was attractive,
novel within the local market, and could be produced
by G. As a game, however, Flip caused G&E to
worry. The rule set they received with the plans was
very terse, and it was not clear that the game was
playable. The purpose of this analysis is to
determine the playability of this game, and to
recommend rule modifications (if any) to improve
its playability.
A Flip game consists of a folding box and a set of
two dice (as shown above). The box opens flat into
a top portion and a bottom portion. Across the top
of the top portion are twelve small wooden tiles,
each of which pivots on a common bar. On each tile
is a number from 1 to 12, which each number repre-sented
exactly once. The bottom portion of the box
is lined in felt.
The game begins with all the numbers visible.
During each round, the dice are rolled in the bottom
part of the box. The player then chooses a subset of
tiles to flip, so that their values are no longer
visible. These tiles must total the roll of the dice.
So, in the initial configuration of all numbers
visible, a roll of 7 gives legal moves “7”, “1-6”,
“2-5”, “3-4”, and “1-2-4”. The game continues until
either all numbers are flipped (the player wins) or
there is no legal move (the player loses).
A sample game is as follows:
Position: 1 2 3 4 5 6 7 8 9 10 11 12
Roll: 9
Decision: Flip 1 and 8
Position: _ 2 3 4 5 6 7 _ 9 10 11 12
Roll: 8
Decision: Flip 2 and 6
Position: _ _ 3 4 5 _ 7 _ 9 10 11 12
Roll: 7
Decision: Flip 7
Position: _ _ 3 4 5 _ _ _ 9 10 11 12
Roll: 6
No move. Player loses.
3. Good Solitaire Games
Before we can analyze this game, we should define
what we want in a solitaire game. Flip already meets
the basic requirements of G&E Woodworking: it is
attractive and can be produced. The tactile feel of
flipping the tiles is also an advantage. Since physi-cally
this seems like a good product, we will limit
ourselves to analyzing Flip as a game.
There are a number of properties we might want in
a solitaire game. These include:
Reasonableness: The player should have a reason-able
chance of winning the game, if she plays
optimally. Different players will differ on what they
deem reasonable, and what is reasonable for a
1 minute game would almost certainly not be
reasonable for a 3 hour game. Flip takes about 1
minute to play. We will define a reasonable version
of Flip to be one in which the expectation of
winning is at least 1 game in 20 (under optimal play).
Responsiveness: The probability of winning should
depend on the player’s choices. Good decisions
should be rewarded with a high chance of winning,
while poor ones should sharply decrease the
probabilities.
Subtlety: The winning strategy should not be easily
summarized. Players should feel there are always
subtle improvements they can make in their actions.
We can certainly add to this list, but these require-ments
seem a natural starting point
* While G&E Woodworking exists, and the problem they faced was essentially the one presented, certain aspects of this case have
been fictionalized for expository purposes.
51
INFORMS Transcations on Education 2:1 (50-58) ÓINFORMS
3. TRICK
Building a Better Game through Dynamic Programming: A Flip Analysis
4. Analyzing Flip
In the appendix, we show how Flip can be formu-lated
as a stochastic dynamic program with 4096
states. In this section, I go through the results and
describe alternative designs for this game.
The first result is the player’s chance of winning for
the definition of Flip given, given the player plays
optimally. This probability is .00362218, or approxi-mately
1 game in 276 (we will use the latter repre-sentation,
as it is easier to read). This number gets
to the heart of the reasonableness of Flip, and it
seems that Flip is not a reasonable game. With a
chance of winning only 1 in 276, it is clear that the
vast majority of buyers will lose interest in the game
long before winning any game.
To determine the responsiveness of Flip, we need
some heuristics to which we compare this probabil-ity
of winning. Two simple heuristics are Lex-Min
and Lex-Max, where the player takes the lexically
minimal (maximal, respectively) feasible choice. In
the starting position, for a roll of 8, Lex-Min flips
4-3-1 and Lex-Max flips 8. Both of these seem
reasonable choices, and might be adopted as heuris-tics
by a player first faced with this game.
The chance of Lex-Min winning the game is 1 in
7671 and that for Lex-Max is 1 in 290. This speaks
well for responsiveness: poor decisions can lead to
very poor outcomes, as in the case of Lex-Min.
Flip does less well in subtlety, however, since the
simple rule Lex-Max does almost as well as the
optimal rule. In fact, for many positions and rolls, it
is easy to see that Lex-Max is the optimal rule. In
particular, if the number given by the roll is not
already flipped, it is always optimal to flip that
number. The only decision points where the rules
differ are in trying to choose pairs or triples that add
up to an unavailable number. The optimal rule in
these cases seems quite subtle. For instance, if we
look only at cases when an 8 is rolled without 8
being available and where any of 1-7, 2-6 or 3-5 can
be chosen, we get the following:
Cases with 3-5 optimal:
1 2 3 5 6 7 10
1 2 3 4 5 6 7 9
1 2 3 4 5 6 7 10
1 2 3 5 6 7 9 10
1 2 3 4 5 6 7 11
1 2 3 5 6 7 10 11
1 2 3 4 5 6 7 9 10
1 2 3 4 5 6 7 9 11
1 2 3 4 5 6 7 10 11
1 2 3 4 5 6 7 9 12
1 2 3 4 5 6 7 10 12
1 2 3 4 5 6 7 9 10 11
1 2 3 4 5 6 7 9 10 12
1 2 3 4 5 6 7 9 11 12
1 2 3 4 5 6 7 10 11 12
1 2 3 4 5 6 7 9 10 11 12
Cases with 2-6 optimal:
1 2 3 5 6 7
1 2 3 5 6 7 9
1 2 3 5 6 7 11
1 2 3 5 6 7 12
1 2 3 5 6 7 9 11
1 2 3 5 6 7 9 12
1 2 3 5 6 7 10 12
1 2 3 5 6 7 11 12
1 2 3 5 6 7 9 10 11
1 2 3 5 6 7 9 10 12
1 2 3 5 6 7 9 11 12
1 2 3 5 6 7 10 11 12
1 2 3 5 6 7 9 10 11 12
Cases with 1-7 optimal:
1 2 3 4 5 6 7
1 2 3 4 5 6 7 12
1 2 3 4 5 6 7 11 12
This seems very subtle indeed!
The difference between Lex-Max and the optimal
decision can be significant. The following give some
examples
1 2 _ 4 5 6 _ 8 _ __ __ __
Roll 11
Optimal (6,5): 1 in 8.6;
Lex-Max (8,2,1): 1 in 15.1
1 2 _ 4 5 6 _ _ _ 10 11 __
Roll 9
Optimal (5,4): 1 in 119.3;
Lex-Max (6,2,1): 1 in 243
1 _ 3 _ _ 6 _ 8 _ 10 __ __
Roll 9
Optimal (6,3): 1 in 29.4;
Lex-Max (8,1): 1 in 44.7
52
INFORMS Transcations on Education 2:1 (50-58) ÓINFORMS
4. TRICK
Building a Better Game through Dynamic Programming: A Flip Analysis
5. Improving the Game
While the optimal strategy is not obvious, the facts
remain that even with best play the player is
unlikely to win, and the strategy of always
choosing the lexically maximum choice does almost
as well as the optimal strategy.
In fact, the unlikeness of winning this game is not
surprising. Consider the end of the game when only
one number is visible (it is not necessary that a game
reach these states, but they are hard to avoid). At
best, the remaining number will be a 7, leading to a
1 in 6 chance of winning on the last role. All other
possibilities are less likely, down to the impossibil-ity
of rolling a 1 with two dice.
We can improve the player’s chances by allowing a
limited number of “failing” rolls. If the player flips
all the numbers using no more than the allotted n
umber of failing rolls, then she is declared a
“winner”. With just the 7 remaining and 5 permitted
failures, the player has an almost 2 in 3 chance of
winning, while leaving the 2 with 5 permitted
failures has less than a 1 in 6 chance.
G&E Woodworking confirmed that this was a rea-sonable
modification to the game, and in fact G
would be able to fashion a small peg and drill holes
to keep count of the number of failing rolls. This
would distinguish the product on the market and
make for a fancier looking game.
The dynamic program is easily modified to track the
number of failing rolls (as shown in the appendix).
The following table shows how the number of
allowed failures affects the probability of winning:
Even allowing as few as 5 failures makes the game
reasonable in its chance of winning and allowing
10 failures still doesn’t make the game too easy. The
drawback of this, however, is that the game is less
responsive to the player’s actions. While a poor
strategy like Lex-Min is strongly punished if no
failures are allowed, the probabilities are only halved
if 10 failures are allowed.
The above analysis assumed that the player is forced
to make a move when a legal move exists. We can
make the game more responsive to the player’s
actions by allowing the player to strategically claim
a failure even when a legal move is possible. While
this counts against the failure count, it may leave
the player in a better position. For instance, in the
situation where only 2 and 5 are unflipped, if the
player rolls a 5, it is much better to declare a failure
if possible and hope for a 7 on the next roll, rather
than play the 5 and hope to roll a 2.
In this case, the decisions for Lex-Min and Lex-Max
do not change, but the optimal decision does strate-gically
use the allowed failures:
Table 2: Probabilities allowing Failures (chance of
winning is 1 in...)
The overall difference with Lex-Max is still not quite
ideal, so Lex-Max remains a good starting heuris-tic.
Like the base game, however, there are still some
states where the difference between Lex-Max and
the Optimal is extreme. As an obvious example, if
there are still failures allowed, the player should
never leave the “1” tile as the only unflipped tile:
Table 1: Probabilities allowing Failures (chance of
winning is 1 in...)
53
INFORMS Transcations on Education 2:1 (50-58) ÓINFORMS
5. TRICK
Building a Better Game through Dynamic Programming: A Flip Analysis
no roll can then flip it. This analysis can get quite
subtle: in 3 4 5 7 9 11, rolling an 8 (allowing the 3 5
to be flipped) has an optimal decision to flip if there
are 5 or fewer failures allowed, but to declare fail-ure
with more than 5, which is hardly a predictable
result.
Allowing failures seems to increase the subtlety of
the game. Without failures, there are 1297 state/roll
pairs (out of 4096*12=49152) where the optimal
decision is not to take the Lex-Max. With 10
permitted failures, this increases to 14,975 out of
the 491,520 states. In addition to the complexity of
determining whether or not to unnecessarily declare
a roll a failure, there exist states where the decision
of which numbers to flip depends on the number of
allowed failures remaining.
6. Flip-Flop
Playing Flip leads to thoughts about more complex
games with the same board. Such games would be a
marketing advantage, and would extend the useful-ness
of the game. Furthermore, some variations may
improve on Flip’s subtlety, which is hampered by
the similarity of the optimal strategy to Lex-Max.
Flip-Flop is a variation of the game that has the
property that the game is always winnable (almost
certainly). For this variation, the number tiles must
be colored differently front and back, and the
numeral must appear on both sides. G&E confirms
this can be done, with E applying a dark stain to one
side of the tile, and a light stain to the other. The
numerals begin all light-side up. The goal is to get
them all dark-side up. During each roll, tiles total-ing
the roll must be rotated (either dark to light or
light to dark). Flip-Flop cannot end in failure, since
there are always legal moves. A player may, how-ever,
find herself cycling between similar positions,
failing to make headway towards the goal. We call
turning a tile from light to dark flipping and from
dark to light flopping. A set of tiles is flipped if each
tile goes from light to dark.
The dynamic program for this problem is more
complicated, since the natural formulation is not
acyclic. Techniques like value-iteration are needed
in order to solve this problem. Again, the details are
in the appendix. Here we evaluate the quality of
Flip-Flop as a solitaire game.
Under optimal play, this game is expected to take
49.42 rolls to complete. The strategy seems subtle,
though there are some general guidelines:
Until 12 is flipped, the game is expected to last
around 50 more rolls; once the 12 is flipped, the
game reduces to around 25 rolls in expectation.
Table 3: Simulated Rolls to Finish
54
INFORMS Transcations on Education 2:1 (50-58) ÓINFORMS
6. TRICK
Building a Better Game through Dynamic Programming: A Flip Analysis
Once a number at least 7 is flipped, it is never
flopped.
Once all the values at least 7 are flipped, the game
is expected to last no more than 10 rolls, and no
position in such a case is expected to take less than
7.5.
The conclusion drawn from this is that the initial
goal is simply to get the values 7 and higher flipped,
without much care for the resulting position among
1-6.
Expectations only give part of the story, however.
To determine the full distribution of outcomes, we
need to simulate the system using the optimal
decision rule. I ran 10,000 simulations, and had the
following distribution of time to finish:
Clearly there is some range of possibilities! Given
the speed at which a game can be played, however,
Flip-Flop seems a reasonable game, though the
frustration of playing optimally and still taking more
than 300 rolls, as happened in 5 of the 10,000 runs,
is obvious.
To determine the responsiveness and subtlety of the
game, it is again necessary to find one or more
natural heuristics to compare against. Some
obvious heuristics do very poorly. For instance,
taking the Lex-Max flip, if one exists, and
otherwise flopping the number corresponding to the
roll is a relatively simple rule. Such a rule appears
not to be bounded in time. This speaks well to the
responsiveness of the game: the optimal rule works
much better than very simple rules.
Given the insights from the optimal strategy (which
generally would not be available to the player), we
can create a simple heuristic that does have reason-able
bounds. The idea is to take the Lex-Max flip
when possible. When no flip is possible, a random
feasible move is chosen with the proviso that no
number at least a “no-flop” value is flopped.
The quality of this heuristic depends on the
“No-flop” value.
Even with the insight from the optimal solution, this
simple heuristic does significantly worse than opti-mal.
All of the decrease is in the final phase, with
the heuristic with a no-flop value of 7 taking, on
average, 14 rolls once the values 7 or higher have
been removed (rather than the optimal of about 8).
This speaks well to the subtlety of the game, par-ticularly
in the final phase.
It is interesting to note that the best choice in this
heuristic is to never flop a 6 or higher, while the
optimal strategy sometimes (but very rarely) flops a
six. This again shows that the optimal solution is
subtler than this simple heuristic can manage.
7. Results and Other Approaches
As a result of this analysis, G&E Woodworking
modified the rules of Flip to allow up to five failing
rolls and to allow a player to declare a failure, even
when a legal move was available. They also stained
the sides of the tiles different colors, included num-bers
on both sides and included the rules to
Flip-Flop.
Given the insights from the optimal strategy (which
generally would not be available to the player), we
can create a simple heuristic that does have reason-able
bounds. The idea is to take the Lex-Max flip
when possible. When no flip is possible, a random
feasible move is chosen with the proviso that no
number at least a “no-flop” value is flopped.
The change implemented to Flip was not the only
possibility. For instance, the winning conditions
could have been changed to allow, say, a “win” if
either 0 or 1 tile remains unflipped. Another fea-sible
winning rule would be to allow a win if either
all the tiles are flipped or the player successfully
makes some number of rolls. A third choice would
be to begin the game with, say, the numbers 10
Table 4: Heuristic results for Flip-Flop through 12 already flipped (in this latter game,
55
INFORMS Transcations on Education 2:1 (50-58) ÓINFORMS
7. TRICK
Building a Better Game through Dynamic Programming: A Flip Analysis
optimal play gives a 1 in 14 chance of winning).
Another direction could have been to change the
game into a two person game. For instance, players
could alternate rolls on a single board, with the
person who is first unable to make a move declared
the loser (a simple modification of the code for Flip
calculates the value of this game: the first player to
roll has a 50.9% chance of winning if both sides
play optimally). Or players could each have their
own board and make simultaneous rolls until one
either flips his entire board (and is the winner) or
cannot make a move (and is the loser).
It is an interesting class exercise to take the dynamic
program developed and to modify it to handle these
and other objectives. The two-person games are very
difficult to analyze, but related problems of making
decisions to maximize the expected number of rolls
that can be made give insight into the underlying
model and lead to interesting analyses of the result-ing
optimal decisions. For example, if the goal is to
minimize the expected number of rolls needed to
flip all the numbers, it is no longer optimal to
always flip the number corresponding to the roll.
For instance, on a roll of 6, in some states it is better
to remove 1, 2, and 3 rather than 6.
While the Flip itself is rather straightforward (but
surprisingly addictive, even to one who spends too
much time playing computer games) and the
analysis is a straightforward application of dynamic
programming, it is the use of the insights given by
optimization to make good decisions (in this case,
deciding on the rules to enhance the marketing of
the game) that is interesting and difficult. Flip has
turned out to be a good classroom of example of not
just how we do operations research, but why we do
it.
56
INFORMS Transcations on Education 2:1 (50-58) ÓINFORMS
8. TRICK
Building a Better Game through Dynamic Programming: A Flip Analysis
Appendix I
Modeling and Implementation
The modeling of this problem as a dynamic program
is a straightforward classroom exercise that can de-generate
into a flurry of notation. We begin with the
formulation of Flip:
First we define the states and decisions. A state will
correspond to an arrangement of tiles and will be
given by a length 12 0-1 string, generally denoted s.
s[i]=1 corresponds to tile i being unflipped, while
s[i]=0 corresponds to a flipped tile. There are
2^12=4096 states. We will refer to the “all flipped”
state as the 0-state.
Define f(s) to be the probability of winning under
optimal play beginning at state s. A decision d(s,r)
(where d(s,r) is the state chosen after starting in s
and rolling r) must be made for every state s and
roll of the dice r. Denote the probability of roll r as
p(r).
Then
f(s) = Sr p(r) f(d(s,r)) (1)
To determine d(s,r), we need to determine which
states are possible starting from s and rolling r. Let
R(r) be the set of length-12 0-1 strings whose “1”
entries add up to r. For instance, for r=5, R(r)
consists of the strings
000010000000
100100000000
011000000000
Define s £ t for two vectors s and t in the normal
way, with each component of s being less than or
equal to the corresponding component of t. Then
d(s,r) = argminuÎR(r), u£ s {f(s-u)} (2)
In (2), u represents a feasible decision for state s
and roll r. If the set over which the argmin is taken
is empty, then the corresponding probability of
winning is 0.
Finally, once all tiles are flipped, we are successful,
so we can base our dynamic program on
f(0) = 1 (3)
Between (1), (2), and (3) we now have our dynamic
programming formulation.
To solve this dynamic program, we can recursively
calculate f(s). We need to order our calculations to
only use previous calculated values. This is done
by ordering the states in increasing order of the
number of “1”s in their representation (with ties
broken arbitrarily) since once a tile is flipped, it can
never be unflipped.
The code provided to solve this recursion uses
reasonably efficient data structures to model this
problem. The R(r) values are kept in a linked list,
and all states and roll values are packed into integer
values (so, for instance, the value 19 is used to
represent the string 110001000000).
Only a few changes are needed to formulate the
version of flip where failing rolls are allowed. The
state space must include the number of failures so
far (so is of the form (s,b) where b is the number of
failures). The failure value increases either when
there is no feasible decision for a state and roll or
when declaring a failure is the optimal decision.
Formulating Flip-Flop is slightly more complicated.
We use the same state definitions as before. f(s) is
defined to be the expected number of rolls remain-ing
under optimal play, given beginning in state s.
Again d(s,r) is the decision for state s and roll r.
Now we have:
f(s) = Sr p(r) f(d(s,r))+1 (1)
where the “+1” adds in the current roll.
Since flipped tiles can be flopped, it is no longer
necessary for a move to be less than the current state.
Instead, the new state is given by sÅ u, where Å
represents the component-wise “exclusive-or”
operation. This gives us:
d(s,r) = argminuÎR(r) {f(sÅ u)} (2)
Finally, once we are in the 0 state, we are done, so
f(0) = 0 (3)
To solve this, it is no longer possible to use straight-forward
backwards recursion: the states do not have
a natural ordering. Instead some version of policy
or value iteration must be used to solve (1), (2) and
(3). The code given uses a simple value iteration
method whereby an initial set of values is refined
57
INFORMS Transcations on Education 2:1 (50-58) ÓINFORMS
9. TRICK
Building a Better Game through Dynamic Programming: A Flip Analysis
until there is no further change.
The formulations given have proven to be at the edge
of the capabilities of my (MBA) students: some are
fluent enough mathematically to generate the results
in this appendix while others are not (Carnegie
Mellon’s MBA class is probably not typical
however, due to a higher number of technically-oriented
students than most MBA programs). Once
we go through these examples in class, however,
students typically are able to work through formu-lations
of variations and, for those with program-ming
skills, update the programs to reflect their
formulations.
58
Appendix II
Links to Source Code and Optimal Strategy
Game: Flip
Program in C: flip.c
http://ite.informs.org/vol2no1/Trick/flip.c
Program in Java: Flip.java
http://ite.informs.org/vol2no1/Trick/Flip.java
Optimal Strategy: flip.out (1.95Mb)
http://ite.informs.org/vol2no1/Trick/flip.out
Game: Flip-Flop
Program in C: flip_flop.c
http://ite.informs.org/vol2no1/Trick/flip_flop.c
Program in Java: FlipFlop.java
http://ite.informs.org/vol2no1/Trick/FlipFlop.java
Optimal Strategy: flip_flop.out (3.90Mb)
http://ite.informs.org/vol2no1/Trick/flip.out
INFORMS Transcations on Education 2:1 (50-58) ÓINFORMS