Lecture 4 - Opponent Modelling

1,648 views

Published on

This is the 4th of an 8 lecture series that I presented at University of Strathclyde in 2011/2012 as part of the final year AI course.

This lecture shows how we can use mathematical analysis to classify players into stereotypes and leverage this classification into generating more successful decisions.

(Some content appears to be missing from the end of this one - I'll fix this as soon as I can)

1 Comment
0 Likes
Statistics
Notes
  • Most customers to online casinos usually head straight to the game that looks or sounds the best. This is a mistake. Look for online slots that offer the highest payout rates. ITS EASY TO START https://t.co/oR0TdRIHe4
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
1,648
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
30
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Lecture 4 - Opponent Modelling

  1. 1. Making Better Decisions - Opponent Modelling 1
  2. 2. Monte Carlo in Poker (Recap) • Yesterday we saw that Monte Carlo could be used to estimate the expected reward of an action by evaluating the delayed reward • We do this by simulating or "rolling out" games to their end state. • Assess the amount we won or lost2
  3. 3. Game Tree and Monte Carlo i F R C Opponent Chance3
  4. 4. Random Walks in the Game Tree • When we walk the Game Tree at random, we pick nodes to follow at random. • We assume (for now) that this is an unbiased choice • This means every choice has the same probability of being chosen4
  5. 5. Can We Do Better? • Random walks are all well and good • But a uniform distribution across action choices isnt accurate ‣ Certain situations will make sensible players more likely to use certain actions • How can we bring this bias into play in the walk?5
  6. 6. Classifying Opponents • The way we do this is to work out what type of player someone is. • We observe them to get a better understanding of how they operate. • In Poker and other games, we can use all sorts of statistical measures to quantify a players type.6
  7. 7. Action Prediction • Once we know what kind of player someone is, we can flip things on their head. • We answered "what is the likelihood this player is type X given we have seen this type of play" • We can now answer "what is the likelihood this player will make action Y given they are of type X" • Remember from Bayes Theorem last week, these questions are closely linked7
  8. 8. Simple (Human) Classification • Pro Poker players try to quantify their opponents into one of several classes based on 3 measures ‣ Voluntarily Put in Pot (VPiP) ‣ Won at Showdown (WSD) ‣ Pre-flop Raise (PFR)8
  9. 9. Player Stereotypes • Players can be ‣ Tight / Loose (how likely they are to play hands) ‣ Passive / Aggressive pre-flop ‣ Passive / Aggressive post-flop9
  10. 10. Utilising Stereotypes • If we can classify players we can use this against them • For instance, we might discover that passive players can be chased off by aggressive play • Or we understand that when a super-conservative player decides to raise, we need to be careful • We can build heuristic rule bases around this like we saw before. • Or we can be much smarter10
  11. 11. Better Classifications • Humans are getting by on 3 dimensions • But Poker has waaaay more statistics available than this • We can make a lot of use of this extra data.11
  12. 12. Poker Tracker • Poker Tracker is a stats package specifically for Poker • Analyses play at online casinos • Real-time access to stats about opponents • Allows players to review hands later12
  13. 13. Stats in Poker • A few slides ago - Poker has many statistics • Poker Tracker keeps tabs on around 150 metrics • Some of these are somewhat similar, some relate more to the games than the players13
  14. 14. Problem of Dimensionality • The problem now is that we have too much information! • Trying to learn on cluttered data can be problematic, assuming it works at all.14
  15. 15. Dimensionality Reduction • Somehow we have to reduce the number of dimensions that our data points are using. • In many ways, getting the right data into a learning algorithm is the biggest challenge. • As much art as it is engineering. • Two options ‣ Feature selection ‣ Feature extraction15
  16. 16. Selection vs Extraction • In Selection, you pick the dimensions you believe to be most relevant ‣ The human players did this to get their 3 dimensional representations • In Extraction, you come up new dimensions that can represent your datapoint16
  17. 17. Principal Components Analysis • PCA is a common strategy for this. • Recasts the dimensions of the datapoint into another set of "basis vectors". • Smushes together dimensions that have a strong correlation ‣ Some stats measures are looking at fundamentally the same thing, in different ways ‣ E.g.Various raise frequency metrics might be treated as a single “aggression” dimension after PCA17
  18. 18. Principal Components Analysis • This was going to be a worked example. • Honestly, that’s way to painful. • For N observations in M dimensions X is a matrix where each column is an observation. • Calculate the mean and std. dev. for each row in the matrix (each dimension)18
  19. 19. Principal Components Analysis • Calculate the covariance matrix, the amount that the dimensions vary with respect to each other. • Calculate the eigenvectors and eigenvalues of the covariance matrix ‣ The eigenvectors are the new basis vectors of the reduced-dimension datapoints ‣ The eigenvalues represent how significant the eigenvector is. Large value = significant19
  20. 20. Principal Components Analysis • Pick the most significant K of the eigenvectors. • Project the original datapoint in X onto the new basis vectors.20
  21. 21. Principal Components Analysis • Honestly, if anyone ever asks you to do this ‣ Get a textbook ‣ Use Matlab ‣ Be really careful because it’s kind of complicated • It is possible to do it by hand. ‣ I can’t anymore...21
  22. 22. Principal Components Analysis • Assuming that you finish the calculations without mucking up. ‣ Or, you find something to work it out for you (Matlab functions for this exist) • What you have now is a new datapoint, that is approximately the same information. • Recast into fewer dimensions. ‣ Note that the dimensions will not make sense22
  23. 23. PCA in Action23
  24. 24. Clustering Algorithms • Having performed PCA, we have a much more manageable set of datapoints, and we’ve eliminated extraneous dimensions • Now we need to group them together. • Clustering algorithms are one approach. • Tries to find a set of “clusters” of points that are grouped together.24
  25. 25. Clustering 50.0 37.5 25.0 12.5 0 0 7.5 15.0 22.5 30.0 Blue Peter style example - real data is rarely so neat25
  26. 26. Clustering • k-means is one of the most popular algorithms ‣ Others exist, fuzzy c-means, FLAME clustering and more • Pick a value for k ‣ You can play around a bit to find good values or use some tricks ‣ Accepted “rule of thumb” :26
  27. 27. K-Means Algorithm • Typically, we run the k-means algorithm as an “iterative refinement” process ‣ Guess at some initial values, keep running the process round and round until it stabilises • Randomly assign datapoints to one of the k clusters • Step 1 - Calculate centroids of the clusters • Step 2 - Update assignment based on new centroids • Rinse and repeat 1 and 2 until convergence.27
  28. 28. K-Means Algorithm • Calculating Centroids of clusters ‣ xj denotes the datapoints being sampled ‣ mi(t+1) denotes mean of cluster i at iteration t+1 ‣ Si(t) denotes the set of datapoints assigned to cluster i at iteration t • Effectively, the average of the datapoints28
  29. 29. K-Means Algorithm • Assigning Datapoints to Clusters • The set of points Si is all datapoints for which the centroid of cluster i (mi) is the nearest centroid.29
  30. 30. K-Means Worked Example • Board work30
  31. 31. From Classification to Prediction • Once we have our clusters defined, we know what datapoints constitute the type of player we are analysing • We can use this to predict what the player will do ‣ We have a collection of “similar” players, we can use their history. ‣ We may be able to use the raw data from the observations directly. • In either case, we can use the classification to predict actions31
  32. 32. Back to Monte Carlo • So, back to the game tree. • We now have an idea of what type of player we are dealing with. • We have an idea of what actions the players are going to take in given situations. • Can we plug this back into the Monte Carlo simulation?32
  33. 33. Informed Walks in the Game Tree • We talked earlier about Opponent nodes in the game tree • Specifically, when we hit an Opponent node, we would use a uniform distribution to randomly pick between the options available. • Now, we can bias that distribution towards selecting the action we expect the player to take.33
  34. 34. Does This Work? • Intuitively, it should • The more accurate we make the simulation, the more accurate the results should be. • Concern is that the prediction process will slow things down too much ‣ Monte Carlo relies on large numbers of samples, if they take too long, accuracy isn’t helping.34
  35. 35. Does This Work? • We don’t know. • It’s been proven to aid Monte Carlo for Poker when k=1 ‣ All players are treated as a generic “player” • This is ongoing research right now in SAIG. • Look for papers next year. :)35
  36. 36. What We Do Know • We’ve previously attempted Machine Learning for Opponent Modelling. • Using 32 different statistical measures (reduced down to 8 significant dimensions by PCA) • Training data of 700,000 hands of Poker • Successfully extracted around 28 different player stereotypes.36
  37. 37. The Aim of the Game • We aren’t going to be able to make an AI that always wins at Poker • There’s too much chance involved ‣ Bad hands come up ‣ Mis-interpreting players • What we want to do is make an AI that performs better than the other players under the same circumstances37
  38. 38. Evaluation • Any time we do research we are testing some sort of scientific hypothesis. • We need to design experiments to test whether the hypothesis is true or not • Science doesnt care if were right - unbiased. Even if were wrong, we have learnt something.38
  39. 39. Evaluation • Consider a pro Poker player • Will win some games and lose others ‣ In fact, a fundamental rule of good poker play is not even taking part in about 80% of the games you sit through • Measuring in terms of a single game doesnt work ‣ Need to look at the forest, not the trees • What counts is how much money the player wins at the end.39
  40. 40. Measuring the Strength of an AI • What we need is a measure of how successful a bot is on average. • Poker gives a metric for this - Big Blind / 100 ‣ Metric is in terms of the table limit - normalised • Note that even for a large number of games, the variance on this measure can be really big. ‣ Recall Black Swan events - low likelihood, high impact. Large wins are Black Swans here.40
  41. 41. Stable Experimentation • We really need a way to remove the variance from the problem. • Ordinarily we might repeat the experimentation, take a large number of sample, use law of averages to our advantage. • We talked yesterday about the state space of just the card dealing component of Poker ‣ We know its too large for this to be an option41
  42. 42. Experimentation • What if we generate experimental scenarios. • A large number of games, with the deck already configured. • We can play the scenario with player A • Then replay the exact same scenario with player B • The results that player A and B generate are now comparable.42
  43. 43. Experimental Design • Designing good experiments is really important • Not just for AI but for all kinds of things • Understanding sources of uncertainty means we can find ways to factor them out • Design fair unbiased experiments • For Science!43
  44. 44. Summary • More detail on Monte Carlo in Poker • Explanation of Opponent Modelling in Poker ‣ Dimensionality Reduction ‣ Clustering algorithms • Exploiting Opponent Models • Experimental Design44
  45. 45. Next Week • Other uses for Opponent Models • Procedural Content Generation • AI in Video Games45

×