Successfully reported this slideshow.
Upcoming SlideShare
×

# AI Strategies for Solving Poker Texas Hold'em

6,029 views

Published on

A discussion of algorithms at the state of the art regarding artificial intelligence strategies for solving the game of Poker Texas Hold'em.

• Full Name
Comment goes here.

Are you sure you want to Yes No
• \$10 Risk Free Spins START FREE https://t.co/CovP0nfYaX

Are you sure you want to  Yes  No
• please can I get this slide, thank you.

Are you sure you want to  Yes  No

### AI Strategies for Solving Poker Texas Hold'em

1. 1. Professor: Marco Schaerf Student: Giovanni Murru AI strategies for solving poker Texas Hold’em Report for Elective in Artificial Intelligence, section: AI and Games Report for AI and games: AI Strategies for solving poker Texas Hold’em
2. 2.  Computer Poker Strategies  Abstractions  Exploitative counter strategies  Near-equilibrium Poker Agents  Human behavior on a Simplified Poker Game Overview Report for AI and games: AI Strategies for solving poker Texas Hold’em
3. 3. Poker game contains an enormous strategy space, imperfect information and stochastic events: All elements that characterize most of the challenging problems in multi-agent systems and game theory Computer poker strategies Report for AI and games: AI Strategies for solving poker Texas Hold’em  Computer Poker Strategies  Abstractions  Exploitative counter strategies  Near-equilibrium Poker Agents  Human behavior on a Simplified Poker Game
4. 4. 1. Preflop 2. Flop 3. Turn 4. River The Game of poker Texas Hold’em Report for AI and games: AI Strategies for solving poker Texas Hold’em  Card game with 52 French cards  2 up to 10 players  Each player receives 2 Hole cards  Game is divided in 4 main steps:  Possible actions:  FOLD  CHECK/CALL  BET/RAISE  JAM (or ALL-IN)  Variants  Limit/no-limit  Tournament/cash game  Score:  2 hole cards + community cards
5. 5. Possible actions in Texas Hold’em Report for AI and games: AI Strategies for solving poker Texas Hold’em  Fold  Player leaves the game  Check/Call  Player remains in game without being forced to commit further chips in the pot (check) or is forced to cover the current bet in order to remain in-game (call)  Bet/Raise  Bet is done when a player wants to invest further chips on the pot. Raise is a reply to a bet with a higher bet (e.g. 3x the bet)  Jam  Also know as All-in, it consists in putting all player’s remaining chips on the pot (note: it can be done even if the amount of chips is smaller than the amount necessary to perform a call action)
6. 6. Limit vs No-Limit  It’s not possible to go All-in  1018 states  Fixed maximum amount for the bets.  Constraints limiting the bets reduce states’ space.  It’s possible to go All-in (Jam)  Very popular, in fact it is the one played in international competitions such as WSOP.  1071 game states Report for AI and games: AI Strategies for solving poker Texas Hold’em LIMIT NO-LIMIT
7. 7. Tournament vs Cash game  Each player pays a fixed amount of money to partecipate in the tournament.  Three prices for the first, the second and the third ranked players are usually given as award.  Ante: a fixed mandatory bet for each player in the game, introduced when blinds are high enough.  Each chip used by the player has a correspondent value in real money.  Players invest real money in the pot.  Strategies are different.  Less popular than tournament because of its intrinsic higher risk. Report for AI and games: AI Strategies for solving poker Texas Hold’em TOURNAMENT CASH GAME
8. 8. 1. Hole Cards 2. Player’s position 3. Stacks 4. Number of players 5. Pot odds 6. Opponent’s strategy Decision Factors Report for AI and games: AI Strategies for solving poker Texas Hold’em
9. 9. David Sklansky Table Report for AI and games: AI Strategies for solving poker Texas Hold’em  David Sklansky created a table which divides the best hands in groups:  Let’s say that it’s safe to fold on the 8th group.  On the contrary a player should never fold if his hand is in the 1st group. 1st group AA KK QQ JJ AKs 2nd group TT AQs AJs KQs AK 3rd group 99 JTs QJs KJs ATs AQ … … … … … … … 8th group 87, A9, Q9, 76, 42s, 32s 96s, 85s, J8, J7s, 65, 54, 74s, K9, T8, 43
10. 10. Opponent Human Strategies Report for AI and games: AI Strategies for solving poker Texas Hold’em  Loose-Passive  Less dangerous players. Always call, play too much hands, never raise or bet.  Loose-Aggressive  Unpredictable but easily exploitable, they tend to raise even with low hands.  Tight-Passive  They never raise, they play few hands and when they do they call.  Tight-Aggressive  They play few hands as like as tight-passive but when they have to good hand they raise, and maybe raise again. They are the most dangerous category.  Good players are those who can adapt their style to one of these categories depending on the current situation.
11. 11. It is a game that models interactions between multiple autonomous agents by describing sequential decision- making scenarios, in situations where imperfect information and non-determinism may emerge. Extensive Form Game Report for AI and games: AI Strategies for solving poker Texas Hold’em
12. 12. EFGs’ Representation Report for AI and games: AI Strategies for solving poker Texas Hold’em  An EFG can be represented by  A game tree  Each node (choice) has a associated player which takes a decision.  The direct edges leaving a node represent the possible actions for the player. However if the node is a chance node they represent the possible chance outcomes.  Each terminal node has associated utilities for all the players in the game.  Game states are partitioned in Information sets.  An information set is a set of game states that are indistinguishable by the player.
13. 13. Computer Poker Strategies Report for AI and games: AI Strategies for solving poker Texas Hold’em  Strategies are classified in  PURE: choose a unique action during the game (e.g. Always-fold, Always-raise, …)  MIXED: assign a probability value to each pure strategy (e.g. probability to fold, check/call or bet/raise (f,c,r)  An information set is a set of game states that are indistinguishable by the players.  The strategy is a mapping between information sets and actions.  Each strategy profile has a correspondent utility.
14. 14. Equilibrium strategy Report for AI and games: AI Strategies for solving poker Texas Hold’em  Nash equilibrium as a solution of the game.  No player can improve its utility by individually changing its strategy.  A Best Response is a strategy that obtains the highest player’s expected utility against the set of all other strategy profiles.  Strategies in Nash equilibrium are a best response for the other strategies in the profile.  In Texas Hold’em, a large extensive game.  It is intractable to compute a real Nash equilibrium.  Introducing approximated ε-Nash equilibrium  A strategy profile in which no player can increase its utility by more than ε, by unilaterally changing its strategy.
15. 15. Abstractions are simplifications of the game aimed at reducing its huge information sets’ space. The solutions computed using abstracted games can be mapped to strategy profiles in the original games. Abstractions Report for AI and games: AI Strategies for solving poker Texas Hold’em  Computer Poker Strategies  Abstractions  Exploitative counter strategies  Near-equilibrium Poker Agents  Human behavior on a Simplified Poker Game
16. 16. Kinds of Abstractions Report for AI and games: AI Strategies for solving poker Texas Hold’em  Card abstraction  Hands divided in buckets based on a certain metrics.  Metrics: Expected Hand Strength (E[HS]) or (E[HS2])  Hands belonging to the same bucket are played in the same way.  Memory abstraction  Imperfect Recall (i.e. forget old information)  Bet abstraction:  betting round reduction (e.g. limit up to 3 the number of possible raise)  elimination of betting rounds (e.g. eliminate the river betting round)  betting abstractions (e.g. eliminate some betting options)
17. 17. Action Abstraction Report for AI and games: AI Strategies for solving poker Texas Hold’em John Hawkin, Robert Holte, Duane Szafron University of Alaberta  Introduce action all-in  Join bet actions together and differentiate into 2 possible choices:  Low Bet (L)  High Bet (H)  L and H depend on the player’ stack  Circle denotes player one’s decision point  Square denotes players two’s decision point.  Diamond is the bet sizing player Each player receivesonecard. After theﬁrst betting round a community card, shared by the players, is dealt and the sec- ond betting round begins. If neither player folds the winner is the player who can makethebest two-card hand. TheMultiplayer betting game transformation Wenow deﬁnethemultiplayer betting gametransformation. It can be applied to domains with actions that have associ- ated real or multi-valued parameters. In the case of poker, the action in question is a bet, and the multi-valued param- eter is the size of the bet. These are combined to create a large number of possible actions of the form “bet x”, where x is an integer between some minimum value and the stack sizeof thebetting player. In thecaseof atrading agent, each of the actions buy and sell may haveaparameter that speci- ﬁes the amount. For brevity, we will use poker terminology throughout this paper. At each decision point where betting is legal we remove most of the betting actions, leaving only an All-in action and a new action we refer to as “bet”. The new bet action is then followed by another decision point for a new player, who we will refer to as a “bet sizing player”. This player chooses between a low bet and a high bet, which we refer to as betting L or betting H , where these are expressed as fold call bet 2 bet 3 bet 4 bet 5 bet 6 (a) Regular no-limit game fold call bet L H All-in (bet 6) (b) Multiplayer betting game Figure 1: Decision point in a no-limit game, and the multi- player betting version of thesame decision point.  “A successful abstraction technique is one that creates a smaller game retaining the important strategic features of the original game”
18. 18. Automated Action Abstraction Report for AI and games: AI Strategies for solving poker Texas Hold’em  Normalized probabilities: P(H) + P(L) = 1  Effective bet size function  B(P(H)) = (1-P(H))L + P(H)H Lemma 1: For any fixed values of L and H and probability P(H)i of player i betting H, the pot size’s expected value after that bet for all the opponents j is equal to: p+B(P(H)i)p where p is the pot size before the bet.  A mapping exists between a given strategy profile ζ belonging to a multiplayer betting game, and a strategy profile ζ’ for the abstract poker game.
19. 19. Automated Action Abstraction  Half-street no-limit Kuhn Poker  Jack, Queen, King  Only 1 card per player  After one betting round game ends  2nd player can’t bet or raise (half-street)  Researchers found that  s=B(P(H))=0.414 was the correct bet size to achieve a Nash equilibrium strategy profile  For larger games, further abstractions Report for AI and games: AI Strategies for solving poker Texas Hold’em
20. 20. An exploitive counter strategy is one that tries to benefit from opponent’s weakness deviating from equilibrium point and leading to higher payoffs. Exploitative Counter Strategies Report for AI and games: AI Strategies for solving poker Texas Hold’em  Computer Poker Strategies  Abstractions  Exploitative counter strategies  Near-equilibrium Poker Agents  Human behavior on a Simplified Poker Game
21. 21. Exploit the opponent Report for AI and games: AI Strategies for solving poker Texas Hold’em  Construct a model of opponent’s behavior  Training phase playing equilibrium  Design counter strategies’ teams  Able to exploit all the different styles of opponents  CAUTION:  Opponents may trick the exploiter playing a certain strategy for several iterations. It’s the paradox of the exploiter being exploited.
22. 22. DBBR: A Real-Time Opponent Modeling Report for AI and games: AI Strategies for solving poker Texas Hold’em Sam Ganzfried, Tuomas Sandholm Carnegie Mellon University  Deviation-Based Best Response  Computes the best response to the opponent model.  Using a recorded public history of opponent’s action frequencies. Variations used:  DBRR-WS  Weight shift of probability  DBRR-L1  Weight distance using norm1  DBRR-L2  Weight distance using norm2
23. 23. DBRR – under the hood  A recorded public history of opponent’s action frequencies is used to compute the opponent’s posterior action probabilities αn,a  Computed observing how often the opponent chooses action a at each public history set n  Where denotes the probability that plays action a at public history set n.  cn,a are the observed probabilities and Nprior is a Dirchlet prior distribution Report for AI and games: AI Strategies for solving poker Texas Hold’em an,a = pn,a * × Nprior +cn,a Nprior + cn,a'a' å pn,a * s *  Next the algorithm computes the posterior bucket probabilities, that are the probabilities the opponent is in each bucket at each public history set n  DBRR algorithm then iterates over all public history sets and returns as output the best response of the opponent model.
24. 24. DBRR - computing the opponent model Report for AI and games: AI Strategies for solving poker Texas Hold’em  Find a strategy that is closer to the equilibrium: such strategy will be less exploitable  Weight-Shifting algorithm that makes use of the approximated bucket ranking  computed at each public history set n from approximated equilibrium ζ*  Suppose opponent is taking action 3 more often than he should at n (i.e. αn,3≥ ϒn,3)  The algorithm start adding weight to bucket that plays action 3  The increase in probability of playing action 3 is compensated by decreasing the probabilities of playing action 1 and 2  The shift is repeated until this inequality holds:  Parameters used for the tests are:  T = 1000  k = 50  Nprior = 5
25. 25. DBRR: Experimental Results  Two-player limit Texas Hold’em (1018 states)  Opponents 2 worst players in 2008 AAAI Computer poker competition  GUS2  Dr. Sahbak Report for AI and games: AI Strategies for solving poker Texas Hold’em  Comparison with GS5:  a bot used in 2009 AAAI  plays ε-equilibrium strategy  DBBR-WS outperforms the other agents
26. 26. The approach described in this section considers the computation of equilibrium solutions using game theoretic principles. Computing Equilibrium in Stochastic Games Report for AI and games: AI Strategies for solving poker Texas Hold’em
27. 27. Stochastic games  A tuple (N, S, A, p, r)  N=1, …, n is a finite set of players  S is a finite set of states  A(s) = (A1(s),…, An(s)) is a tuple where Ai(s) is the set of available actions at state s for the player i.  ps,t(a) is the probability to transit from state s to state t, taking the action a  r(s) is the payoff function, is a vector whose elements denotes payoff for i-th player at state s  The payoff can be positive (gain), negative (loss) or zero (drawn).  At each state s the i-th player can choose a distribution of probability for the set of his own action Ai(s).  The probability to select action ai is denoted as σ(ai)  We define σ(a) as the product Πi σ(ai)  Extend definition of transition function:  Goal of the agent in the stochastic games is to maximize the payoff: Report for AI and games: AI Strategies for solving poker Texas Hold’em ps,t (s) = [s(a)× ps,t (a)] aÎA(s) å gt × ri (st ) t=0 ¥ å
28. 28. Find an ε-Nash equilibrium Report for AI and games: AI Strategies for solving poker Texas Hold’em  Several algorithms to compute an approximated equilibrium solution in stochastic games of imperfect information.  VI-FP  PI-FP  FP-MDP  Approximated jam/fold equilibrium in a three-player tournament.  All the algorithm can be logically divided in two parts:  Inner loop  Outer loop Sam Ganzfried, Tuomas Sandholm Carnegie Mellon University
29. 29. Standard Fictitious Play Report for AI and games: AI Strategies for solving poker Texas Hold’em  When a gamer plays a best response to his opponent strategies.  Smoothed version of fictitious play updates the strategy for player i based on a simple learning model.  Where is the best response of player i to the profile , which is the set of its opponent’s strategies at the previous time-step.  The algorithm converges to a Nash equilibrium in the case of 2-player zero-sum games, but the same is not true for the case of multi-player or general-sum games. si t = 1- 1 t æ è ç ö ø ÷si t-1 + 1 t ¢si t ¢si t s-i t-1
30. 30. VI-FP vs PI-FP  Initialization using ICM heuristic  Inner loop uses an extension of smoothed fictitious play (FP)  Determines an ε-equilibrium S  Outer loop uses an iteration that adjust the values (VI-FP) or Policy Iteration (PI-FP)  Policy Iteration is an alternative way to find an optimal policy π*  Inner and outer loops are executed until no state’s payoff changes by more than δ between outer loop iterations.  M is the Markov Decision Process (MDP) induced by the strategy profile s* computed with VI-FP  PI-FP can recover from a poor initialization because it uses values resulting from evaluating the policy Report for AI and games: AI Strategies for solving poker Texas Hold’em
31. 31. Policy Iteration  π0 is initialized to the near optimal policy s*  It is an alternative way to find an optimal policy π*  States in poker tournaments are made of vectors containing the stack size for each player  e.g. 3-players state = (x1, x2, x3)  x1: stack of the button  x2: stack of the small blind  x3: stack of the big blind Report for AI and games: AI Strategies for solving poker Texas Hold’em
32. 32. PI-FP  PI-FP uses Policy Iteration (PI) as outer loop and Fictitious Play (FP) as inner loop  Differs from VI-FP in how the values are updated in the outer loop  An optimal policy for M corresponds to a best response of each player in G to the profile s* Report for AI and games: AI Strategies for solving poker Texas Hold’em
33. 33. FP-MDP  Reverse the role of inner (Markov Decision Process) and outer loop (Fictitious Play)  Policy iteration to initialize the strategies.  MDP is induced and constructed by the strategy profile Si  Strategies are updated using fictitious play updating rule  Repeat until the sequence {sn} converges  Convergence is not guaranteed but if it happens to occur then the final strategy profile is a equilibrium  If PI-FP and FP-MDP converge, the final strategy is an equilibrium. This is not guaranteed for VI-FP Report for AI and games: AI Strategies for solving poker Texas Hold’em
34. 34. Running time required to find an approximated equilibrium solution VI-FP vs PI-FP vs FP-MDP Report for AI and games: AI Strategies for solving poker Texas Hold’em  Experimental results show that PI-FP outperforms the old algorithm VI-FP  Interesting that the algorithms converged to an equilibrium despite the fact that they are not guaranteed to do so.  The graphic was generated running an ex-post check over the strategies computed at the end of each outer loop iteration
35. 35. This section examines some recent techniques used to compute a near Nash equilibrium, focusing on recent and at-state-of-the-art algorithms In all the cases the agents used abstractions. Near-equilibrium Poker Agents Report for AI and games: AI Strategies for solving poker Texas Hold’em  Computer Poker Strategies  Abstractions  Exploitative counter strategies  Near-equilibrium Poker Agents  Human behavior on a Simplified Poker Game
36. 36. Fictitious play vs Range of skill  Fictitious play is centered around the idea that strategies of 2 players repeatedly playing games against each other could improve and adapt over time.  Starts with players adopting random strategies.  Training phase  As the number of iteration increases the computed strategies approach a Nash equilibrium  It’s a sort of learning  Range of skill is a iterative procedure that considers creating a sequence of agents, where the next agent created employs a strategy that can beat the previously created agent by at least an amount of ε  As the number of agents in the sequence increases agent’s strategies approach an ε-Nash equilibrium  The algorithm calls repeatedly a procedure named generalized best response Report for AI and games: AI Strategies for solving poker Texas Hold’em
37. 37. CFR regret minimization Report for AI and games: AI Strategies for solving poker Texas Hold’em Nick Abou Risk, Duane Szafron University of Alberta  Counterfactual:  calculations are weighted by the probability of reaching a particular information set.  Regret:  the difference between the highest utility achievable and the utility gained with the action that was taken.  CFR brings to an overall regret minimization, which leads to ε-Nash equilibrium profile.
38. 38. Kuhn & Leduc Hold’em: 3-players variants  Kuhn is a poker game invented in 1950  Bluffing, inducing bluffs, value betting  3-player variant used for the experiments  Deck with 4 cards of the same suit  K>Q>J>T  Each player is dealt 1 private card  Ante of 1 chip before card are dealt  One betting round with 1-bet cap  If there’s a outstanding bet, the player can fold or call  If there’s not the player can check or bet 1 chip  Leduc Hold’em differs in  multi-round play  community cards  rounds with different bet size  Raising  3-player variant  Deck with 8 cards: 4 ranks and 2 suits  (K>Q>J>T)  Each player is dealt 1 private card  Antes of 1 chip before cards are dealt  Two betting rounds with different size  Preflop: 2 chips  Flop: 4 chips  Upper limit of 2 bets per betting round  1 community card before the flop  Ranks: a paired hand beat an unpaired one  If there are no pairs high card wins Report for AI and games: AI Strategies for solving poker Texas Hold’em
39. 39. CFR for computing ε-Nash equilibrium Report for AI and games: AI Strategies for solving poker Texas Hold’em  CFR: Counterfactual Regret minimization  Iteratively computes strategies that converge to a ε-Nash equilibrium.  Iteratively minimize a counterfactual regret value.  Advantage: memory linear in the size of information sets.  Separates regret values into different information sets and minimizes their individual regret values.  Theoretically valid for 2-player zero-sum perfect recall games.  However recent research brought CFR in multiplayer games.
40. 40. Another solution to evaluate CFR Report for AI and games: AI Strategies for solving poker Texas Hold’em  3-player limit Texas Hold’em poker agents (1024 game states)  Takes weeks or months of computational time!  Use simplified poker games  Kuhn  Equilibrium can be found  Leduc Hold’em  No convergence to equilibrium is found  Impossible to find equilibrium with complex games  How to solve the problem and evaluate CFR agents on Texas Hold’em?  Use Benchmark Agents.  Let play million of hands.
41. 41. The agents used in the experiments  Poki  Winner of multiplayer limit event of 2008 Computer Poker Competition  Chump agents:  Always-Fold, Always-Call, Always-Raise  Probe, chump agent that choose with equal probability between call and raise Report for AI and games: AI Strategies for solving poker Texas Hold’em Benchmark agents CFR designed agents  PR2  2-bucket, perfect recall  32 GB machine  20 million iterations (~ 3 weeks)  Betting upper limit on the river reduced to 3  IR16x  16-bucket, imperfect recall  64 GB machine  IR16S  20 million iterations  IR16L  43 million iterations (~ 1 month)  All three of the CFR-generated agents outperform Poki.  PR2 was the best exploitative agent (it obtained a value of 1086 mb/hand against Always-Call and Always-Raise)
42. 42. Agents evaluation Report for AI and games: AI Strategies for solving poker Texas Hold’em  Bankroll events:  Players are ranked based on their sole win rate  Elimination events:  Winner is determined by recursively removing the worst agent. Top 3 players are graduated considering the win rates.  All three of the CFR-generated agents outperform Poki. PR2 is the best exploitative agent (it obtained a value of 1086 mb/hand against Always-Call and Always-Raise)
43. 43. Heads-Up Experts (HUEs)  If 1 player folds, use heads-up experts  Note: in two-players zero-sum and perfect recall it is guaranteed to find a ε-Nash equilibrium  Problem: how to locate the point of the game in which heads-up emerges?  Solution: compute frequencies of betting sequences that bring to a fold from 1.2 million hands of self-play between 3 identical agents.  Uniform vs expert seeding of HUEs subtrees.  Hands are placed in buckets with ranking 5>4>3>2>1 Report for AI and games: AI Strategies for solving poker Texas Hold’em
44. 44. Heads-Up Experts (HUEs) Report for AI and games: AI Strategies for solving poker Texas Hold’em  In 3-player Texas Hold’em  1 player folds  Heads Up  IDEA  Locate the point of the game in which Heads Up emerges and use HUEs.  How? Compute frequencies of betting sequences that bring to a fold.  WHY?  Because in 2-players zero-sum and perfect recall it is guaranteed to find a ε-Nash equilibrium.
45. 45. In a recent paper, researchers in Behavioral Decision Making, present an experiment involving 120 human players competing in a pure strategy simplified poker game against a computer programmed to play either the equilibrium strategy or fictitious Interesting result in this research was that humans made a considerable amount of mistakes by betting when they should have checked or by calling when they should have folded. Human behavior on a simplified poker game Report for AI and games: AI Strategies for solving poker Texas Hold’em  Computer Poker Strategies  Abstractions  Exploitative counter strategies  Near-equilibrium Poker Agents  Human behavior on a Simplified Poker Game
46. 46. Experiments in the past  Proposed and solved a 2-player poker game  ante of 1 chip before dealing the cards  Cards: random number in the interval [0,1]  After card are dealt P1 can bet or fold.  If P1 folds P2 wins the pot  If P1 bets P2 can call or fold.  The solution:  Presence of a single threshold value for deciding to bet or not.  Differs from Borel only by the fact that:  After cards are dealt P1 can bet or check (no fold)  If P1 checks game ends. No decision for P2. Highest card wins.  If P1 bets P2 can call or fold (like in Borel game)  The optimal policy: Report for AI and games: AI Strategies for solving poker Texas Hold’em 1938, Borel 1947, Von Neumann and Morgenstern
47. 47. The game and the experiment THE GAME:  Hands are dealt from a deck containing only 7 ordered cards: {2,3,4,5,6,7,8}  Ante of 1 chip.  Bet is fixed to 2 chips  After the cards are dealt  P1 can check or bet  If P1 checks the game ends  If P1 bets (2 chips) P2 can fold or call  Optimal solution  P1, bet (bluff) with {2}, check with {3,4,5,6}, bet with {7,8}  P2, fold with {2,3,4,5}, call with {6,7,8}  Following optimal play, the game favors P1 THE EXPERIMENT:  120 human subjects (approx same percentage of male and females)  Against a computer:  Fictitious Play  Designed to take advantage of weak opponents  Equilibrium Solution  Motivated by a potential award of \$25 Report for AI and games: AI Strategies for solving poker Texas Hold’em
48. 48. Behavioral and Cognitive Learning Theory  Behavioral Learning Theory (BLT) states that subjects use different heuristics (or mental models) in order to solve the game.  The choice of the mental model is influenced by the information about the game available to the player.  e.g. Provide information on opponent’s threshold increase the amount of bluff during the game  Art of bluffing is not part of the mental model and should be learned  BLT deals with optimization of play in a selected mental model  Cognitive Learning Theory (CLT) deals with selection of the right mental model.  It is interesting the interplay between BLT and CLT Report for AI and games: AI Strategies for solving poker Texas Hold’em
49. 49. Details of the experiment Report for AI and games: AI Strategies for solving poker Texas Hold’em  Computer fifty-fifty played ES and FP.  Subjects were randomly assigned to 4 conditions:  Players didn’t know the strategy employed by the computer.  They also ignore the game would last 200 trials.  BLT states that subjects playing against FP should have faster learning rate than those playing against ES.  CLT states that subjects playing the role P2 should always have faster learning rate than those playing the role of P1. Condition Human’s role Machine’s role P1-FP First to play Fictitious Play P1-ES First to play Equilibrium Solution FP-P2 Second to play Fictitious Play ES-P2 Second to play Equilibrium Solution
50. 50. Results of the research (1/2)  Proportion of consistency with equilibrium solution of player’s decision behavior  Actions more consistent with ES:  Bet action for subjects in the role of P1  Call action for subjects in the role of P2  Player’s decision behavior did not change (adapt)  They failed to approach an equilibrium play with repetitions of the game.  200 trials were not enough to learn the correct behavior Report for AI and games: AI Strategies for solving poker Texas Hold’em
51. 51. Results of the research (2/2) Report for AI and games: AI Strategies for solving poker Texas Hold’em  Likelihood of betting monotonically increase based on the value of the hand  Exception for the card 2  Several subjects understood bluffing  A lot of non-equilibrium decisions  Didn’t decrease during the trials.  Individual differences, as predicted by heterogeneity of mental models described by CLT  Other aspects of BLT and CLT were not proved.  Irrational Subjects -> Trying to fool the computer!  However:  ES ignored irrational play  FP exploit irrational play
52. 52. Results of the research (3/3) Report for AI and games: AI Strategies for solving poker Texas Hold’em  Subjects decision very far away from the optimal equilibrium solution  Subjects did not improve during the 200 trials.  Most of the errors for  P1: they didn’t bluff with the lowest card.  P2: they called on intermediate cards when they should have folded.  As in previous studies players failed to bluff in the role of P1 and called too often in the role of P2  Other beliefs of BLT and CLT, like subjects learning faster against FP or in P2 condition, were not confirmed by the results.  The study suggested that games with counter-intuitive elements such as bluffing behavior might take longer to learn than predicted.  Because the subjects need to adapt their mental models to include these counter-intuitive elements in their solution sets.
53. 53. References ① John Hawkin, Robert Holte, Duane Szafron Automated Action Abstraction of Imperfect Information Extensive-Form Games, Department of Computing Science, University of Alberta, 2011 ② Sam Ganzfried, Tuomas Sandholm, Computing Equilibria in Mul- tiplayer Stochastic Games of Imperfect Information, Department of Computer Science, Carnegie Mellon University, 2009 ③ Darryl A. Seale, Steven E. Phelan, Bluffing and Betting Behavior in a Simplified Poker Game, Journal of Behavioral Decision Making, University of Nevada Las Vegas, 2009 ④ Jonathan Rubin, Ian Watson, Computer Poker: A Review, Department of Computer Science, University of Auckland, 2011 ⑤ Nick Abou Risk, Duane Szafron, Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents, Department of Computer Science, University of Alberta, 2010 ⑥ Sam Ganzfried, Tuomas Sandholm, Game Theory-Based Oppo- nent Modeling in Large Imperfect-Information Games, Computer Science Department, Carnegie Mellon University, 2011 ⑦ Andrea Cannizzaro, Il Texas hold’em in4e4’otto: regole, strategie e consigli utili per vincere live e online, Edizioni L’Airone, 2010 ⑧ Wikipedia, http://www.wikipedia.org/ Report for AI and games: AI Strategies for solving poker Texas Hold’em