• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Lecture 3 - Decision Making
 

Lecture 3 - Decision Making

on

  • 524 views

This is the 3rd of an 8 lecture series that I presented at University of Strathclyde in 2011/2012 as part of the final year AI course. ...

This is the 3rd of an 8 lecture series that I presented at University of Strathclyde in 2011/2012 as part of the final year AI course.

This lecture moves beyond the Game Theoretic definition of a game, and demonstrates how algorithms can be used not only to find a single good choice, but a sequence of choices that will eventually reach a winning state.

Statistics

Views

Total Views
524
Views on SlideShare
524
Embed Views
0

Actions

Likes
0
Downloads
52
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Lecture 3 - Decision Making Lecture 3 - Decision Making Presentation Transcript

    • Making Decisions in Games 1
    • Theory of Real Games • We’ve been talking about “games” as single instances of choice - heads/tails, odds/evens etc. • We’ve talked about how we can repeat the game (iterating) and interesting things happen. • Are most games the same choice repeatedly?2
    • Real Games • At a much less abstract level, a game is not one choice repeated. • A sequence of different choices. • Delayed reward3
    • Delayed Reward • Last week we could see the payoffs for each choice pair in the games. • Does a single move in chess have a “reward”? • The reward is whether the game is won or lost - the combined result of the choice sequence4
    • Evaluating Delayed Rewards • We need to evaluate what the expected payoff of a given choice is. • Typically we can only do this at the end of the game. • How can we decide what to do now if we won’t know if it was a good decision until later?5
    • Chess • Opening move is one choice. • Opponent makes their move. • You reply. • Note that your 2nd move is a totally different theoretical “game” to the first move.6
    • Chess • Initially there are 20 opening moves • Your opponent has 20 responding moves • 2 moves in, the size of the potential statespace is 400 states. • The game gets more complicated later ‣ Average number moves per turn : 35 ‣ Average game length : 80 • State space size (Shannons number) :3 123 - HUGE7
    • Search • This state space is way too big for an exhaustive search approach like mini-max • Any brute force approach is not going to work • We need some mechanism to guide the search towards areas of the game tree that are useful8
    • Heuristics • A heuristic is formally a “strategy using readily accessible, though loosely applicable, information to control problem solving in human beings and machines” • Less formally, it’s a guess-timate of the value of a state, typically based on the distance to the goal (planning) or likelihood of winning (games)9
    • Using Heuristics • Heuristics guide search across spaces that are too complex to fully enumerate. • Estimate potential of the next set of states using the heuristic and go with the best looking one. • Can be combined with a search strategy like Best First Search or Enforced Hill Climbing10
    • Heuristic Example - A* • A* search for path planning is a great example of heuristics in use. • In a world of tiles, find an optimal path from A to B. • A* uses two metric : ‣ Concrete metric of the work to get to a location (g) ‣ Estimate of work to get from location to goal (h) • Search strategy always chooses location that minimises (h+g)11
    • Heuristic Example - A*12
    • Heuristic Example - A* B A13
    • Heuristic Example - A* B A14
    • Heuristic Example - A* B 1+7 A 1 +715
    • Heuristic Example - A* B 1+7 A 1 +716
    • Heuristic Example - A* B 2+6 1+7 A 2 + 8 1 +717
    • Heuristic Example - A* B 2+6 1+7 A 2 + 8 1 +718
    • Heuristic Example - A* B 3+5 2+6 1+7 A 2 + 8 1 +719
    • Heuristic Example - A* B 3+5 2+6 1+7 A 2 + 8 1 +720
    • Heuristic Example - A* 4+4 B 3+5 4+4 2+6 1+7 A 2 + 8 1 +721
    • Heuristic Example - A* 4+4 5+3 B 3+5 4+4 5+3 2+6 1+7 A 2 + 8 1 +722
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +723
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +724
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +7 2 + 625
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +7 2 + 626
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 2 + 8 1 +7 2 + 6 3 + 527
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 4+4 2 + 8 1 +7 2 + 6 3 + 5 4 + 428
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 1+7 A 4+4 2 + 8 1 +7 2 + 6 3 + 5 4 + 429
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 5+3 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 430
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 5+3 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 431
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 432
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 433
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 7+1 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 434
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 7+1 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 435
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 7+1 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 436
    • Heuristic Example - A* 4+4 5+3 6+2 B 3+5 4+4 5+3 7+1 2+6 6+4 5+3 6+2 1+7 A 4+4 5+3 2 + 8 1 +7 2 + 6 3 + 5 4 + 437
    • Heuristics • Heuristics can guide our search • Help us understand what states are bringing us closer to our goals • Allow us to backtrack when a promising route becomes problematic • Do they work well for games?38
    • The Maths of Choice • Common (basic) Combinatorics problem: ‣ How many X element sub-sets can I make from this set of Y elements. • Less formally : ‣ How many different ways can I pick Y things from X things39
    • Choice • We can refer to this as “Choosing” • “I have 5 things, I choose 2” • We can write it as : 5 C 240
    • Binomials • Mathematically, n C k is equivalent to the binomial coefficient • This can be re-written as ‣ ( nk / k! )41
    • Permutations • The choose operator tells you how many sets there are with unique elements. • What if the order that the elements are in matters? • For this we use Permutation ‣ nPk • Equivalent to : ‣ n! / (n - k)!42
    • Poker • Card game. • Typically involves gambling. • “Poker” is technically an entire set of different games that share similar structure. • For the purposes of this lecture, Poker refers specifically to “Limit Texas Hold ‘Em”43
    • Texas Hold ‘Em • Variant of poker created in 1900’s • Typically 2-10 player games • Popular recently - Poker on TV and online is typically Texas Hold ‘Em • Aim is to make best hand 5 card hand possible using any of two private “hole” cards and 5 public “community” cards44
    • Phases of the Game • The game is broken into four phases. • Initial or “Pre-flop” - Hole cards are dealt and a round of betting occurs. • Flop - The first three community cards are dealt, another round of betting. • Turn - A fourth community card is dealt, and a round of betting • River - Final community card dealt, final betting45
    • Some Terminology • Raise - Increase the bet amount • Fold - Give up on this game, losing any money already bet • Call - Put in an amount of money to equal what others are wagering • Blinds - An initial mandatory wager by two players. Small and Big. Players responsible for the blind rotates each game.46
    • Poker in Research • Poker has been a major research area for AI for many years. • Characteristics in common with many real world problems ‣ Hidden information ‣ Bluffing ‣ Loss minimisation47
    • Poker at SAIG • Major research area for us for many years • Under my supervision for the last 2 years as honours projects and Summer internships. • Much of what you’re going to hear about this week is based on current research happening right now at SAIG48
    • Strathclyde Poker Research Environment • SPREE was developed to overcome two challenges we face. ‣ Training data sets obtained from online casinos are imperfect information. This leads to bad machine learning ‣ Every research project wasted significant time re- implementing a framework for Poker • SPREE is open source client/server implementation in Java, with AI-based client and GUI client. • http://sourceforge.net/projects/spree-poker49
    • Limit or No Limit? • Two types of game - Limit and No Limit • No Limit - Classical movie Poker. ‣ Raises can be any amount ‣ Any number of raises • Limit - Common rule set ‣ Raises are a single fixed amount ‣ Limited number of raises allowed per round50
    • Limit or No Limit? • Focus on Limit • Significantly reduces complexity of the problem. • Also means we can focus on the game, rather than the psychological aspects.51
    • Poker State Space • At each point, each player has typically 3 options ‣ Raise, Call, Fold • We can approximate the size of the search space at point k as 3k • We can also determine lower and upper bounds for k since in Limit there are a fixed number of raises.52
    • Dealing Cards • For a game of N players, 2N + 5 cards are required. • There are 52 C (2N + 5) different sets of cards that could be dealt. • But who gets which card is important, so we need to use Permutation not Choose • 52 P (2N+5) ‣ For a standard 10 player game - 5.86 x102453
    • Length of a Poker Game (Lower) • In the shortest game possible, all players fold. • The last player (who put in the Big Blind) wins by default • N-1 choices to reach this point • 2N cards are required • 3(N-1) * 52 P 2N • For a standard 10 player game : ‣ 19683 * 3x1032 = 6x103654
    • Length of a Poker Game (Upper) • In the longest game possible • All players initially call, final player to call instead raises. • 4N-4 turns per round, 4 rounds = 16N-16 turns total • 2N + 5 cards required • 3(16N-6) * 52 P 2N + 5 • Again for a 10 player game ‣ 5x1068 * 7.4x1039 = 3.7x1010855
    • Total State Space Size • The total state space is smaller than Shannon’s number • Still completely unwieldy for any kind of exhaustive search • Note that we’ve considered the lower and upper bounds of the state space. • Actual values will typically fall somewhere between. • Also note that the upper bound hinges on the restrictions imposted by Limit, and we don’t need to consider any state complexity variable raise size would introduce.56
    • Abstraction • There are some things we can do to trim this down (a bit) • Firstly, we can simplify our view of the starting position • We don’t need to consider all possible cards that could be dealt ‣ Cards that will help us change the situation ‣ Cards that don’t help us can be grouped together57
    • Starting Hands • There are 52 C 2 = 1,326 potential opening hands • But we can reduce this ‣ Suit doesn’t matter except for matching ‣ We can reduce it to 2 card “suited” or “unsuited” ‣ 2c, 7d is equivalent to 2d, 7c or 2s, 7h • This gives a total number of abstract hands as ‣ 13 (pairs) + 13 C 2 (suited) + 13 C 2 (unsuited) = 169 • We’ll see tomorrow there are more abstractions.58
    • Heuristics for Poker • “Every hand’s a winner and every hand’s a loser” • Heuristics for Poker are tricky because of this. • Analysis is largely based on your own hand - if my hand at a point is such-and-such a type or better, it is worth playing • Kind of naive59
    • “Expert” Poker Systems • You can make a somewhat capable agent by combining a bunch of these naive heuristics. • It’s known which of the starting hands are strong and which are weak. • You can make a guess as to what you should do based on your hand strength. ‣ This is not massively informed • Basic, functional approach, attempts to lift out general rules that will lead to good results.60
    • Evaluating Delayed Reward in Poker • I’ve mentioned delayed reward a few times • How does this fit into Poker? • We know that the strength of our hand alone won’t decide the game. • We know that opponents can bluff about their hand strength. • Need to find out “what happens if” for possible actions61
    • Monte Carlo Tree Search • Initially used without formally defining it by Buffon and Fermi (among others) • Developed at Los Alamos by our Game Theory friend John Von Neumann • For a large enough sample size, random sampling can often take the place of exhaustive enumeration62
    • Samples and Probes • When we say a “random sample” we want to sample the potential outcomes ‣ And find the potential rewards • The leaf nodes of the game tree have the final value of the game. • By randomly walking from the current node to leaf nodes, we can build up a picture of where our actions might lead us.63
    • Exploration vs Exploitation • We can sample at random, and well get coverage in all areas • Some areas are more promising than others • We want to "exploit" these areas and inspect them closely ‣ Ensure that they are as good as they look • At the same time, we want to keep "exploring" in case there are better areas in the game tree. • Balancing these two contradictory goals falls to the UCT heuristic.64
    • Reward Evaluation • We can use the Monte Carlo samples to simulate down to the end of the game. • Establish whether we win or lose (and how much). • Bubble this value back up the tree. • Build a picture of the amount we can expect to win based on the actions we are considering this turn.65
    • Caveat Emptor • What we’ve seen today is just ONE approach to tackling Poker. • It’s an open challenge in AI to find a good solution • The techniques used are important • More important is the reasoning for using these approaches. • AI as a toolkit, not a definitive solution.66
    • Sampling the St Petersburg Paradox 1 2,147,483,647 2 834,532,607 4 435,781,603 8 222,566,052 16 108,347,756 32 54,225,257 64 27,184,330 128 13,605,016 256 6,792,164 512 3,393,086 1024 1,698,22867
    • Sampling the St Petersburg Paradox • If we repeatedly play out the St Petersburg game we see that it behaves much as we expect • Half the games end immediately, a quarter after 1 turn and so on. • 4,000,000,000 samples, the average is only £14.50 • Where the Expected Value metric didnt inform our decision making, we can use sampling to see what actually happens!68
    • Summary • Understanding real games • Delayed reward systems • Poker • Monte Carlo with UCT (in brief)69
    • Next Lecture • More on Monte Carlo • Describing a player mathematically • Categorising players into types • Using this classification for better decisions70