Like this presentation? Why not share!

# Lecture 3 - Decision Making

## on Dec 04, 2011

• 524 views

This is the 3rd of an 8 lecture series that I presented at University of Strathclyde in 2011/2012 as part of the final year AI course. ...

This is the 3rd of an 8 lecture series that I presented at University of Strathclyde in 2011/2012 as part of the final year AI course.

This lecture moves beyond the Game Theoretic definition of a game, and demonstrates how algorithms can be used not only to find a single good choice, but a sequence of choices that will eventually reach a winning state.

### Views

Total Views
524
Views on SlideShare
524
Embed Views
0

Likes
0
52
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to

## Lecture 3 - Decision MakingPresentation Transcript

• Making Decisions in Games 1
• Theory of Real Games • We’ve been talking about “games” as single instances of choice - heads/tails, odds/evens etc. • We’ve talked about how we can repeat the game (iterating) and interesting things happen. • Are most games the same choice repeatedly?2
• Real Games • At a much less abstract level, a game is not one choice repeated. • A sequence of different choices. • Delayed reward3
• Delayed Reward • Last week we could see the payoffs for each choice pair in the games. • Does a single move in chess have a “reward”? • The reward is whether the game is won or lost - the combined result of the choice sequence4
• Evaluating Delayed Rewards • We need to evaluate what the expected payoff of a given choice is. • Typically we can only do this at the end of the game. • How can we decide what to do now if we won’t know if it was a good decision until later?5
• Chess • Opening move is one choice. • Opponent makes their move. • You reply. • Note that your 2nd move is a totally different theoretical “game” to the ﬁrst move.6
• Chess • Initially there are 20 opening moves • Your opponent has 20 responding moves • 2 moves in, the size of the potential statespace is 400 states. • The game gets more complicated later ‣ Average number moves per turn : 35 ‣ Average game length : 80 • State space size (Shannons number) :3 123 - HUGE7
• Search • This state space is way too big for an exhaustive search approach like mini-max • Any brute force approach is not going to work • We need some mechanism to guide the search towards areas of the game tree that are useful8
• Heuristics • A heuristic is formally a “strategy using readily accessible, though loosely applicable, information to control problem solving in human beings and machines” • Less formally, it’s a guess-timate of the value of a state, typically based on the distance to the goal (planning) or likelihood of winning (games)9
• Using Heuristics • Heuristics guide search across spaces that are too complex to fully enumerate. • Estimate potential of the next set of states using the heuristic and go with the best looking one. • Can be combined with a search strategy like Best First Search or Enforced Hill Climbing10
• Heuristic Example - A* • A* search for path planning is a great example of heuristics in use. • In a world of tiles, ﬁnd an optimal path from A to B. • A* uses two metric : ‣ Concrete metric of the work to get to a location (g) ‣ Estimate of work to get from location to goal (h) • Search strategy always chooses location that minimises (h+g)11
• Heuristic Example - A*12
• Heuristic Example - A* B A13
• Heuristic Example - A* B A14
• Heuristic Example - A* B 1+7 A 1 +715
• Heuristic Example - A* B 1+7 A 1 +716
• Heuristic Example - A* B 2+6 1+7 A 2 + 8 1 +717
• Heuristic Example - A* B 2+6 1+7 A 2 + 8 1 +718
• Heuristic Example - A* B 3+5 2+6 1+7 A 2 + 8 1 +719
• Heuristic Example - A* B 3+5 2+6 1+7 A 2 + 8 1 +720
• Heuristic Example - A* 4+4 B 3+5 4+4 2+6 1+7 A 2 + 8 1 +721
• Heuristic Example - A* 4+4 5+3 B 3+5 4+4 5+3 2+6 1+7 A 2 + 8 1 +722
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +723
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +724
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +7 2 + 625
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +7 2 + 626
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +7 2 + 6 3 + 527
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 4+4 2 + 8 1 +7 2 + 6 3 + 5 4 + 428
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 4+4 2 + 8 1 +7 2 + 6 3 + 5 4 + 429
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 5+3 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 430
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 5+3 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 431
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 432
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 433
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 7+1 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 434
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 7+1 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 435
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 7+1 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 436
• Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 7+1 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 437
• Heuristics • Heuristics can guide our search • Help us understand what states are bringing us closer to our goals • Allow us to backtrack when a promising route becomes problematic • Do they work well for games?38
• The Maths of Choice • Common (basic) Combinatorics problem: ‣ How many X element sub-sets can I make from this set of Y elements. • Less formally : ‣ How many different ways can I pick Y things from X things39
• Choice • We can refer to this as “Choosing” • “I have 5 things, I choose 2” • We can write it as : 5 C 240
• Binomials • Mathematically, n C k is equivalent to the binomial coefﬁcient • This can be re-written as ‣ ( nk / k! )41
• Permutations • The choose operator tells you how many sets there are with unique elements. • What if the order that the elements are in matters? • For this we use Permutation ‣ nPk • Equivalent to : ‣ n! / (n - k)!42
• Poker • Card game. • Typically involves gambling. • “Poker” is technically an entire set of different games that share similar structure. • For the purposes of this lecture, Poker refers speciﬁcally to “Limit Texas Hold ‘Em”43
• Texas Hold ‘Em • Variant of poker created in 1900’s • Typically 2-10 player games • Popular recently - Poker on TV and online is typically Texas Hold ‘Em • Aim is to make best hand 5 card hand possible using any of two private “hole” cards and 5 public “community” cards44
• Phases of the Game • The game is broken into four phases. • Initial or “Pre-ﬂop” - Hole cards are dealt and a round of betting occurs. • Flop - The ﬁrst three community cards are dealt, another round of betting. • Turn - A fourth community card is dealt, and a round of betting • River - Final community card dealt, ﬁnal betting45
• Some Terminology • Raise - Increase the bet amount • Fold - Give up on this game, losing any money already bet • Call - Put in an amount of money to equal what others are wagering • Blinds - An initial mandatory wager by two players. Small and Big. Players responsible for the blind rotates each game.46
• Poker in Research • Poker has been a major research area for AI for many years. • Characteristics in common with many real world problems ‣ Hidden information ‣ Blufﬁng ‣ Loss minimisation47
• Poker at SAIG • Major research area for us for many years • Under my supervision for the last 2 years as honours projects and Summer internships. • Much of what you’re going to hear about this week is based on current research happening right now at SAIG48
• Strathclyde Poker Research Environment • SPREE was developed to overcome two challenges we face. ‣ Training data sets obtained from online casinos are imperfect information. This leads to bad machine learning ‣ Every research project wasted signiﬁcant time re- implementing a framework for Poker • SPREE is open source client/server implementation in Java, with AI-based client and GUI client. • http://sourceforge.net/projects/spree-poker49
• Limit or No Limit? • Two types of game - Limit and No Limit • No Limit - Classical movie Poker. ‣ Raises can be any amount ‣ Any number of raises • Limit - Common rule set ‣ Raises are a single ﬁxed amount ‣ Limited number of raises allowed per round50
• Limit or No Limit? • Focus on Limit • Signiﬁcantly reduces complexity of the problem. • Also means we can focus on the game, rather than the psychological aspects.51
• Poker State Space • At each point, each player has typically 3 options ‣ Raise, Call, Fold • We can approximate the size of the search space at point k as 3k • We can also determine lower and upper bounds for k since in Limit there are a ﬁxed number of raises.52
• Dealing Cards • For a game of N players, 2N + 5 cards are required. • There are 52 C (2N + 5) different sets of cards that could be dealt. • But who gets which card is important, so we need to use Permutation not Choose • 52 P (2N+5) ‣ For a standard 10 player game - 5.86 x102453
• Length of a Poker Game (Lower) • In the shortest game possible, all players fold. • The last player (who put in the Big Blind) wins by default • N-1 choices to reach this point • 2N cards are required • 3(N-1) * 52 P 2N • For a standard 10 player game : ‣ 19683 * 3x1032 = 6x103654
• Length of a Poker Game (Upper) • In the longest game possible • All players initially call, ﬁnal player to call instead raises. • 4N-4 turns per round, 4 rounds = 16N-16 turns total • 2N + 5 cards required • 3(16N-6) * 52 P 2N + 5 • Again for a 10 player game ‣ 5x1068 * 7.4x1039 = 3.7x1010855
• Total State Space Size • The total state space is smaller than Shannon’s number • Still completely unwieldy for any kind of exhaustive search • Note that we’ve considered the lower and upper bounds of the state space. • Actual values will typically fall somewhere between. • Also note that the upper bound hinges on the restrictions imposted by Limit, and we don’t need to consider any state complexity variable raise size would introduce.56
• Abstraction • There are some things we can do to trim this down (a bit) • Firstly, we can simplify our view of the starting position • We don’t need to consider all possible cards that could be dealt ‣ Cards that will help us change the situation ‣ Cards that don’t help us can be grouped together57
• Starting Hands • There are 52 C 2 = 1,326 potential opening hands • But we can reduce this ‣ Suit doesn’t matter except for matching ‣ We can reduce it to 2 card “suited” or “unsuited” ‣ 2c, 7d is equivalent to 2d, 7c or 2s, 7h • This gives a total number of abstract hands as ‣ 13 (pairs) + 13 C 2 (suited) + 13 C 2 (unsuited) = 169 • We’ll see tomorrow there are more abstractions.58
• Heuristics for Poker • “Every hand’s a winner and every hand’s a loser” • Heuristics for Poker are tricky because of this. • Analysis is largely based on your own hand - if my hand at a point is such-and-such a type or better, it is worth playing • Kind of naive59
• “Expert” Poker Systems • You can make a somewhat capable agent by combining a bunch of these naive heuristics. • It’s known which of the starting hands are strong and which are weak. • You can make a guess as to what you should do based on your hand strength. ‣ This is not massively informed • Basic, functional approach, attempts to lift out general rules that will lead to good results.60
• Evaluating Delayed Reward in Poker • I’ve mentioned delayed reward a few times • How does this ﬁt into Poker? • We know that the strength of our hand alone won’t decide the game. • We know that opponents can bluff about their hand strength. • Need to ﬁnd out “what happens if” for possible actions61
• Monte Carlo Tree Search • Initially used without formally deﬁning it by Buffon and Fermi (among others) • Developed at Los Alamos by our Game Theory friend John Von Neumann • For a large enough sample size, random sampling can often take the place of exhaustive enumeration62
• Samples and Probes • When we say a “random sample” we want to sample the potential outcomes ‣ And ﬁnd the potential rewards • The leaf nodes of the game tree have the ﬁnal value of the game. • By randomly walking from the current node to leaf nodes, we can build up a picture of where our actions might lead us.63
• Exploration vs Exploitation • We can sample at random, and well get coverage in all areas • Some areas are more promising than others • We want to "exploit" these areas and inspect them closely ‣ Ensure that they are as good as they look • At the same time, we want to keep "exploring" in case there are better areas in the game tree. • Balancing these two contradictory goals falls to the UCT heuristic.64
• Reward Evaluation • We can use the Monte Carlo samples to simulate down to the end of the game. • Establish whether we win or lose (and how much). • Bubble this value back up the tree. • Build a picture of the amount we can expect to win based on the actions we are considering this turn.65
• Caveat Emptor • What we’ve seen today is just ONE approach to tackling Poker. • It’s an open challenge in AI to ﬁnd a good solution • The techniques used are important • More important is the reasoning for using these approaches. • AI as a toolkit, not a deﬁnitive solution.66
• Sampling the St Petersburg Paradox 1 2,147,483,647 2 834,532,607 4 435,781,603 8 222,566,052 16 108,347,756 32 54,225,257 64 27,184,330 128 13,605,016 256 6,792,164 512 3,393,086 1024 1,698,22867
• Sampling the St Petersburg Paradox • If we repeatedly play out the St Petersburg game we see that it behaves much as we expect • Half the games end immediately, a quarter after 1 turn and so on. • 4,000,000,000 samples, the average is only £14.50 • Where the Expected Value metric didnt inform our decision making, we can use sampling to see what actually happens!68
• Summary • Understanding real games • Delayed reward systems • Poker • Monte Carlo with UCT (in brief)69
• Next Lecture • More on Monte Carlo • Describing a player mathematically • Categorising players into types • Using this classiﬁcation for better decisions70