Lecture 4 - Opponent Modelling

Making Better Decisions -
Opponent Modelling

1

Monte Carlo in Poker
(Recap)

• Yesterday we saw that Monte Carlo could be used to
estimate the expected reward of an action by evaluating the
delayed reward
• We do this by simulating or "rolling out" games to their end
state.
• Assess the amount we won or lost

2

Game Tree and Monte
Carlo
i
F R
C
Opponent

Chance

3

Random Walks
in the Game Tree

• When we walk the Game Tree at random, we pick nodes to
follow at random.
• We assume (for now) that this is an unbiased choice
• This means every choice has the same probability of being
chosen

4

Can We Do Better?

• Random walks are all well and good
• But a uniform distribution across action choices isn't
accurate
‣ Certain situations will make sensible players more likely to use
certain actions
• How can we bring this bias into play in the walk?

5

Classifying Opponents

• The way we do this is to work out what type of player
someone is.
• We observe them to get a better understanding of how they
operate.
• In Poker and other games, we can use all sorts of statistical
measures to quantify a player's type.

6

Action Prediction

• Once we know what kind of player someone is, we can ﬂip
things on their head.
• We answered "what is the likelihood this player is type X
given we have seen this type of play"
• We can now answer "what is the likelihood this player will
make action Y given they are of type X"
• Remember from Bayes Theorem last week, these questions
are closely linked

7

Simple (Human)
Classiﬁcation

• Pro Poker players try to quantify their opponents into one of
several classes based on 3 measures
‣ Voluntarily Put in Pot (VPiP)
‣ Won at Showdown (WSD)
‣ Pre-ﬂop Raise (PFR)

8

Player Stereotypes

• Players can be
‣ Tight / Loose (how likely they are to play hands)
‣ Passive / Aggressive pre-ﬂop
‣ Passive / Aggressive post-ﬂop

9

Utilising Stereotypes

• If we can classify players we can use this against them
• For instance, we might discover that passive players can be
chased off by aggressive play
• Or we understand that when a super-conservative player
decides to raise, we need to be careful
• We can build heuristic rule bases around this like we saw
before.
• Or we can be much smarter

10

Better Classiﬁcations

• Humans are getting by on 3 dimensions
• But Poker has waaaay more statistics available than this
• We can make a lot of use of this extra data.

11

Poker Tracker

• Poker Tracker is a stats package speciﬁcally for Poker
• Analyses play at online casinos
• Real-time access to stats about opponents
• Allows players to review hands later

12

Stats in Poker

• A few slides ago - Poker has many statistics
• Poker Tracker keeps tabs on around 150 metrics
• Some of these are somewhat similar, some relate more to
the games than the players

13

Problem of Dimensionality

• The problem now is that we have too much information!
• Trying to learn on cluttered data can be problematic,
assuming it works at all.

14

Dimensionality Reduction

• Somehow we have to reduce the number of dimensions that
our data points are using.
• In many ways, getting the right data into a learning algorithm
is the biggest challenge.
• As much art as it is engineering.
• Two options
‣ Feature selection
‣ Feature extraction

15

Selection vs Extraction

• In Selection, you pick the dimensions you believe to be most
relevant
‣ The human players did this to get their 3 dimensional
representations
• In Extraction, you come up new dimensions that can
represent your datapoint

16

Principal Components
Analysis

• PCA is a common strategy for this.
• Recasts the dimensions of the datapoint into another set of
"basis vectors".
• Smushes together dimensions that have a strong correlation
‣ Some stats measures are looking at fundamentally the same thing, in
different ways
‣ E.g.Various raise frequency metrics might be treated as a single
“aggression” dimension after PCA

17

Analysis

• This was going to be a worked example.
• Honestly, that’s way to painful.
• For N observations in M dimensions X is a matrix
where each column is an observation.
• Calculate the mean and std. dev. for each row in the
matrix (each dimension)

18

Analysis
• Calculate the covariance matrix, the amount that
the dimensions vary with respect to each other.
• Calculate the eigenvectors and eigenvalues of the
covariance matrix
‣ The eigenvectors are the new basis vectors of the
reduced-dimension datapoints
‣ The eigenvalues represent how signiﬁcant the
eigenvector is. Large value = signiﬁcant
19

Analysis

• Pick the most signiﬁcant K of the eigenvectors.
• Project the original datapoint in X onto the new
basis vectors.

20

Analysis

• Honestly, if anyone ever asks you to do this
‣ Get a textbook
‣ Use Matlab
‣ Be really careful because it’s kind of complicated
• It is possible to do it by hand.
‣ I can’t anymore...

21

Analysis
• Assuming that you ﬁnish the calculations without
mucking up.
‣ Or, you ﬁnd something to work it out for you (Matlab
functions for this exist)
• What you have now is a new datapoint, that is
approximately the same information.
• Recast into fewer dimensions.
‣ Note that the dimensions will not make sense
22

Clustering Algorithms

• Having performed PCA, we have a much more manageable
set of datapoints, and we’ve eliminated extraneous
dimensions
• Now we need to group them together.
• Clustering algorithms are one approach.
• Tries to ﬁnd a set of “clusters” of points that are grouped
together.

24

Clustering
50.0

37.5

25.0

12.5

0
0 7.5 15.0 22.5 30.0

Blue Peter style example - real data is rarely so neat
25

Clustering

• k-means is one of the most popular algorithms
‣ Others exist, fuzzy c-means, FLAME clustering and more
• Pick a value for k
‣ You can play around a bit to ﬁnd good values or use
some tricks
‣ Accepted “rule of thumb” :

26

K-Means Algorithm

• Typically, we run the k-means algorithm as an
“iterative reﬁnement” process
‣ Guess at some initial values, keep running the process
round and round until it stabilises
• Randomly assign datapoints to one of the k clusters
• Step 1 - Calculate centroids of the clusters
• Step 2 - Update assignment based on new centroids
• Rinse and repeat 1 and 2 until convergence.
27

K-Means Algorithm

• Calculating Centroids of clusters
‣ xj denotes the datapoints being sampled
‣ mi(t+1) denotes mean of cluster i at iteration t+1
‣ Si(t) denotes the set of datapoints assigned to cluster i at
iteration t
• Effectively, the average of the datapoints

28

K-Means Algorithm

• Assigning Datapoints to Clusters
• The set of points Si is all datapoints for which the
centroid of cluster i (mi) is the nearest centroid.

29

K-Means Worked Example

• Board work

30

From Classification to
Prediction
• Once we have our clusters defined, we know what
datapoints constitute the type of player we are analysing
• We can use this to predict what the player will do
‣ We have a collection of “similar” players, we can use
their history.
‣ We may be able to use the raw data from the
observations directly.
• In either case, we can use the classification to predict actions

31

Back to Monte Carlo

• So, back to the game tree.
• We now have an idea of what type of player we are dealing
with.
• We have an idea of what actions the players are going to
take in given situations.
• Can we plug this back into the Monte Carlo simulation?

32

Informed Walks
in the Game Tree

• We talked earlier about Opponent nodes in the game tree
• Speciﬁcally, when we hit an Opponent node, we would use a
uniform distribution to randomly pick between the options
available.
• Now, we can bias that distribution towards selecting the
action we expect the player to take.

33

Does This Work?

• Intuitively, it should
• The more accurate we make the simulation, the
more accurate the results should be.
• Concern is that the prediction process will slow
things down too much
‣ Monte Carlo relies on large numbers of samples, if they
take too long, accuracy isn’t helping.

34

Does This Work?

• We don’t know.
• It’s been proven to aid Monte Carlo for Poker when
k=1
‣ All players are treated as a generic “player”
• This is ongoing research right now in SAIG.
• Look for papers next year. :)

35

What We Do Know

• We’ve previously attempted Machine Learning for
Opponent Modelling.
• Using 32 different statistical measures (reduced
down to 8 signiﬁcant dimensions by PCA)
• Training data of 700,000 hands of Poker
• Successfully extracted around 28 different player
stereotypes.
36

The Aim of the Game

• We aren’t going to be able to make an AI that
always wins at Poker
• There’s too much chance involved
‣ Bad hands come up
‣ Mis-interpreting players
• What we want to do is make an AI that performs
better than the other players under the same
circumstances
37

Evaluation

• Any time we do research we are testing some sort
of scientiﬁc hypothesis.
• We need to design experiments to test whether the
hypothesis is true or not
• Science doesn't care if we're right - unbiased. Even if
we're wrong, we have learnt something.

38

Evaluation

• Consider a pro Poker player
• Will win some games and lose others
‣ In fact, a fundamental rule of good poker play is not even
taking part in about 80% of the games you sit through
• Measuring in terms of a single game doesn't work
‣ Need to look at the forest, not the trees
• What counts is how much money the player wins at
the end.
39

Measuring the Strength of
an AI
• What we need is a measure of how successful a bot
is on average.
• Poker gives a metric for this - Big Blind / 100
‣ Metric is in terms of the table limit - normalised
• Note that even for a large number of games, the
variance on this measure can be really big.
‣ Recall Black Swan events - low likelihood, high
impact. Large wins are Black Swans here.
40

Stable Experimentation

• We really need a way to remove the variance from
the problem.
• Ordinarily we might repeat the experimentation, take
a large number of sample, use law of averages to our
advantage.
• We talked yesterday about the state space of just the
card dealing component of Poker
‣ We know it's too large for this to be an option
41

Experimentation

• What if we generate experimental scenarios.
• A large number of games, with the deck already
conﬁgured.
• We can play the scenario with player A
• Then replay the exact same scenario with player B
• The results that player A and B generate are now
comparable.
42

Experimental Design

• Designing good experiments is really important
• Not just for AI but for all kinds of things
• Understanding sources of uncertainty means we can
ﬁnd ways to factor them out
• Design fair unbiased experiments
• For Science!

43

Summary

• More detail on Monte Carlo in Poker
• Explanation of Opponent Modelling in Poker
‣ Dimensionality Reduction
‣ Clustering algorithms
• Exploiting Opponent Models
• Experimental Design

44

Next Week

• Other uses for Opponent Models
• Procedural Content Generation
• AI in Video Games

45

Lecture 4 - Opponent Modelling

Recommended

Recommended

More Related Content

Similar to Lecture 4 - Opponent Modelling

Similar to Lecture 4 - Opponent Modelling (20)

More from Luke Dicken

More from Luke Dicken (17)

Recently uploaded

Recently uploaded (20)

Lecture 4 - Opponent Modelling