Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Game theory Farsi
Game theory Farsi
Loading in …3
1 of 68

Strategies Without Frontiers



Download to read offline

Predicting your adversary's behaviour is the holy grail of threat modeling. This talk will explore the problem of adversarial reasoning under uncertainty through the lens of game theory, the study of strategic decision-making among cooperating or conflicting agents. Starting with a thorough grounding in classical two-player games such as the Prisoner's Dilemma and the Stag Hunt, we will also consider the curious patterns that emerge in iterated, round-robin, and societal iterated games.

But as a tool for the real world, game theory seems to put the cart before the horse: how can you choose the proper strategy if you don't necessarily even know what game you're playing? For this, we turn to the relatively young field of probabilistic programming, which enables us to make powerful predictions about adversaries' strategies and behaviour based on observed data.

This talk is intended for a general audience; if you can compare two numbers and know which one is bigger than the other, you have all the mathematical foundations you need.

Strategies Without Frontiers

  1. 1. Meredith L. Patterson BSidesLV August 5, 2014 STRATEGIES WITHOUT FRONTIERS
  2. 2.  I hate boring problems  I especially hate solving tiny variations on the same boring problem over and over again  The internet is full of the same boring problems over and over again  Both in the cloud …  … and in the circus  Not my circus, not my monkeys MOTIVATION
  3. 3.  Information theory  Probability theory  Formal language theory (of course)  Control theory  First-order logic  Haskell ALSO APPEARING IN THIS TALK
  4. 4.  When an unknown agent acts, how do you react?  Observation of side effects  Signals the agent sends  Past interactions with others  Formal language theory (if you’re a computer)  Systematic knowledge about the structure of interactions and the incentives involved in them IT IS PITCH BLACK. YOU ARE LIKELY TO BE EATEN BY A GRUE.
  5. 5.  Everything You Actually Need to Know About Classical Game Theory  in math …  … and psychology  Changing the Game  Extensive form and signaling games  Multiplayer and long-running games  Reasoning Under Uncertainty, Over Real Data OUTLINE
  7. 7.  Players  Information available at each decision point  Possible actions at each decision point  Payoffs for each outcome  Strategies (pure or mixed)  Or behaviour, in iterated or turn-taking games  Equilibria  Different kinds of games have different kinds of equilibria WHAT’S IN A GAME?
  8. 8. a, b c, d e, f g, h A NORMAL FORM GAME Cooperate Defect Cooperate Defect
  9. 9.  Pure strategy: fully specified set of moves for every situation  Mixed strategy: probability assigned to each possible move, random path through game tree  Behaviour strategies: probabilities assigned at information sets STRATEGIES
  10. 10. PRISONER’S DILEMMA -1, -1 -3, 0 0, -3 -2, -2 Cooperate Defect Cooperate Defect d, e > a, b > g, h > c, f
  11. 11. MATCHING PENNIES 1, -1 -1, 1 -1, 1 1, -1 Heads Tails Heads Tails a = d = f = g > b = c = e = h
  12. 12. DEADLOCK 1, 1 0, 3 3, 0 2, 2 Cooperate Defect Cooperate Defect e > g > a > c and d > h > b > f
  13. 13. STAG HUNT 2, 2 0, 1 1, 0 1, 1 Stag Hare Stag Hare a = b > d = e = g = h > c = f
  14. 14. CHICKEN 0, 0 -1, 1 1, -1 -10, -10 Swerve Straight Swerve Straight e > a > c > g and d > b > f > h
  15. 15. HAWK/DOVE 𝑽 𝟐 , 𝑽 𝟐 0, V V, 0 𝑉−𝐶 2 , 𝑉−𝐶 2 Share Fight Share Fight e > a > c > g and d > b > f > h
  16. 16. BATTLE OF THE SEXES 3, 2 0, 0 0, 0 2, 3 Opera Football Opera Football (a > g and h > b) > c = d = e = f
  17. 17.  Games can be zero-sum or non-zero-sum  Games can be about conflict or cooperation  Actions are not inherently morally valenced  Payoffs determine type of game, strategy WHAT HAVE WE SEEN SO FAR?
  18. 18.  Cournot equilibrium: each actor’s output maximizes its profit given the outputs of other actors  Nash equilibrium: each actor is making the best decision they can, given what they know about each other’s decisions  Subgame perfect equilibrium: eliminates non- credible threats  Trembling hand equilibrium: considers the possibility that a player might make an unintended move EQUILIBRIUM
  20. 20. MIND GAMES “As far as the theory of games is concerned, the principle which emerges here is that any social intercourse whatsoever has a biological advantage over no intercourse at all.”
  21. 21.  Procedures  Operations  Rituals  Pastimes  (Predatory) Games TYPES OF INTERACTIONS
  22. 22.  “Hands” or roles = players  Extensive form; players move in response to each other  Advantages  Existential advantage: confirmation of existing beliefs  Internal psychological advantage: direct emotional payoff  External psychological advantage: avoiding a feared situation  Internal social advantage: structure/position with respect to other players  External social advantage: as above, wrt non-players BERNE’S GAMES: STRUCTURE
  23. 23.  Kick Me  Goal: Sympathy  Find someone to beat on you, then whine about it  “My misfortunes are better than yours”  Ain’t It Awful  Can be a pastime, but also manifests as a game  Player displays distress; payoff is sympathy and help  Why Don’t You – Yes, But  Player claims to want advice. Player doesn’t really want it.  Goal: Reassurance BERNE’S GAMES: EXAMPLES
  24. 24.  Now I’ve Got You, You Son Of A Bitch  Goal: Justification (or just money)  Three-handed version is the badger game  Roles  Victim  Aggressor  Confederate  Moves  Provocation → Accusation  Defence → Accusation  Defence → Punishment THE BADGER GAME
  25. 25.  “Schlemiel,” in Berne’s glossary  Moves:  Provocation → resentment  (repeat)  If B responds with anger, A appears justified in more anger  If B keeps their cool, A still keeps pushing TROLLING
  26. 26.  Social media  Organic responses against predatory games  Predator Alert Tool  /r/TumblrInAction “known trolls” wiki  Those just happen to be ones I know about  A truly generic reputation system is probably a pipe dream  Wikipedia  eBay  But for these, we have to extend the basic mathematical model. OTHER MONKEY GAMEBOARDS
  28. 28. THE SETUP
  29. 29. THE TYPE Split Steal 1
  30. 30. BOTH SPLIT
  31. 31. BOTH SPLIT Split Steal 1 1 1 A B Split Split 2 2 6800, 6800 6800, 6800
  33. 33. ONE SPLITS, ONE STEALS Split Steal 1 1 1 A B Split Split 6800, 6800 6800, 6800 2 2 A Split 2 Steal Steal B Split 2 0, 13600 0, 13600 13600, 0 13600, 0
  34. 34. BOTH STEAL
  35. 35. BOTH STEAL Split Steal 1 1 1 A B Split Split 6800, 6800 6800, 6800 2 2 A Split 2 Steal Steal B Split 2 0, 13600 0, 13600 13600, 0 13600, 0 Steal Steal 0, 0 0, 0
  36. 36. NORMAL FORM Also known as the Friend-or-Foe game. 1, 1 0, 2 2, 0 0, 0 Split Steal Split Steal d = e > a = b > c = f = g = h
  38. 38. FIRST MOVE: NICK’S CHOICE Split Steal 1 1 1 “I’m likely to split” “I’m likely to steal” Split Split 6800, 6800 6800, 6800 2 SplitSteal Steal “I’m likely to steal” Split 0, 13600 0, 13600 13600, 0 13600, 0 Steal Steal 0, 0 0, 0 “I’m likely to split” 2
  39. 39. SIGNALING
  40. 40. SECOND MOVE: NICK’S SIGNAL Split Steal 1 1 1 “I’m likely to split” “I’m likely to steal” Split Split 6800, 6800 6800, 6800 2 SplitSteal Steal “I’m likely to steal” Split 0, 13600 0, 13600 13600, 0 13600, 0 Steal Steal 0, 0 0, 0 “I’m likely to split” 2
  41. 41. THE BIG REVEAL
  42. 42. THE COMPLETE PATH Split Steal 1 1 1 “I’m likely to split” “I’m likely to steal” Split Split 6800, 6800 6800, 6800 2 SplitSteal Steal “I’m likely to steal” Split 0, 13600 0, 13600 13600, 0 13600, 0 Steal Steal 0, 0 0, 0 “I’m likely to split” 2
  44. 44.  Strategies now depend on payoff matrix and history  Axelrod, 1981: how well do these strategies perform against each other over time?  “Ecological” tournaments: players abandon bad strategies  Rapoport: if the only information you have is how player X interacted with you last time, the best you can do is Tit-for-Tat  TFT cannot score higher than its opponent  Axelrod: “Don’t be envious”  Against TFT, no one can do better than cooperate  Axelrod: “Don’t be too clever” ITERATED GAMES
  45. 45.  Nice: S is a nice strategy iff it will not defect on someone who has not defected on it  Retaliatory: S is a retaliatory strategy iff it will defect on someone who defects on it  Forgiving: S is a forgiving strategy iff it will stop defecting on someone who stops defecting on it PROPERTIES
  46. 46.  Ord/Blair, 2002: what happens when strategies can take into account all past interactions?  We can express strategies in convenient first-order logic, as it turns out  Tit-for-Tat: D(c, r, p)  Tit-for-Two-Tats: D(c, r, p) ∧ D(c, r, b(p))  Grim: ∃t D(c, r, t)  Bully: ¬∃t D(c, r, t)  Spiteful-Bully: ¬∃t D(c, r, t) ∨ ∃s (D(c, r, s) ∧ D(c, r, b(s)) ∧ D(c, r, b(b(s))))  Vigilante: ¬∃j D(c, j, p)  Police: D(c, r, p) ∨ ∃j (D(c, j, p) ∧ ¬∃k(D(j, k, b(p))) SOCIETAL ITERATED GAME THEORY
  47. 47. EVOLUTION IS A HARSH MISTRESS Tit-for-Tat All-Cooperate Spiteful-Bully
  48. 48. PEACEKEEPING Police All-Cooperate Spiteful-Bully
  49. 49.  In a society, niceness is more nuanced  Individually nice: will not defect on someone who has not defected on it  Meta-individually nice: will not defect on individually nice  Communally nice: will not defect on someone who has not defected at all  Meta-communally nice: will not defect on communally nice  Same applies to forgiveness and retaliation  Loyalty: will not defect on the same strategy as itself NICENESS AND LOYALTY
  50. 50.  Peacekeepers don’t always agree  Police will defect on Vigilantes and vice versa  Peacekeepers protect non-peacekeeping strategies at their own expense META-PEACEKEEPING Police All-Cooperate Spiteful-Bully Tit-for-Tat
  51. 51. REDUCTIO AD ABSURDUM: ABSOLUTIST ∃t ∃j D(r, j, t) ⊕ D(c, j, t) Tit-for-Tat All-Cooperate Spiteful-Bully Absolutist
  52. 52. ABSOLUTISM UBER ALLES Tit-for-Tat All-Cooperate Spiteful-Bully Absolutist
  54. 54.  Frequentist: probability is the long-term frequency of events  Reasoning from absolute probabilities  What happens if an event only happens once?  Returns an estimate  Bayesian: probability is a measure of confidence that an event will occur  Reasoning from relative probabilities  Returns a probability distribution over outcomes  Update beliefs (confidence) as new evidence arrives TWO INTERPRETATIONS OF PROBABILITY P(A|X) = P X A P(A) P(X)
  55. 55.  Probability distribution function: assigns probabilities to outcomes  Discrete: a finite set of values (enumeration)  Function also called a probability mass function  Poisson, binomial, Bernoulli, discrete uniform…  Continuous: arbitrary-precision values  Function also called a probability density function  Exponential, Gaussian (normal), chi-squared, continuous uniform…  Mixed: both discrete and continuous  Narrower distribution = greater certainty DISTRIBUTIONS 𝐸 𝑍 𝜆 = 𝜆 𝐸 𝑍 𝜆 = 1 𝜆
  56. 56.  Game theory is great when you know the payoffs  What can you do if you don’t know the payoffs?  Or what the game tree looks like?  Well…  You usually have some educated guesses about who the players are  You have some idea what your possible actions are, as well as the other players’  You can look at past interactions and make inferences  Which of these can be random variables? All of them.  Deterministic: if all inputs are known, value is known  Stochastic: even if all inputs are known, still random YOU DON’T KNOW WHAT YOU DON’T KNOW
  57. 57.  Figure out what distribution to use  Figure out what parameter you need to estimate  Figure out a distribution for it, and any parameters  Observing data tells you what your priors are  Fixing values for stochastic variables  Markov Chain Monte Carlo: sampling the posterior distribution thousands of times DON’T WAIT — SIMULATE
  58. 58.  Prerequisites:  A Markov chain with an equilibrium distribution  A function f proportional to the density of the distribution you care about  Choose some initial set of values for all variables (state, S)  Modify S according to Markov chain state transitions  If f(S’)/f(S) ≥ 1, S’ is more likely than S, so accept  Otherwise, accept S’ with probability f(S’)/f(S)  Repeat CONVERGING ON EXPECTED VALUES
  59. 59. A GAME WITHOUT PAYOFFS type Outcome = Measure (Bool, Bool) type Trust = Double type Strategy = Trust -> Bool -> Bool -> Measure Bool tit :: Trust -> Bool -> Bool -> Measure Bool tit me True _ = conditioned $ bern 0.9 tit me False _ = conditioned $ bern me
  60. 60. CHOOSING WHICH HOLE TO FILL IN play :: Strategy -> Strategy -> (Bool, Bool) -> (Trust, Trust) -> Outcome play strat_a strat_b (last_a,last_b) (a,b) = do a_action <- strat_a a last_b last_a b_action <- strat_b b last_a last_b return (a_action, b_action) iterated_game :: Measure (Double, Double) iterated_game = do let a_initial = False let b_initial = False a <- unconditioned $ uniform 0 1 b <- unconditioned $ uniform 0 1 rounds <- replicateM 10 $ return (a, b) foldM_ (play tit tit) (a_initial, b_initial) rounds return (a, b)
  61. 61. LET’S PLAY A GAME games = [Just (toDyn False), Just (toDyn False), Just (toDyn False), Just (toDyn True), Just (toDyn False), Just (toDyn False), Just (toDyn False), Just (toDyn True), Just (toDyn False), Just (toDyn True), Just (toDyn False), Just (toDyn False), Just (toDyn False), Just (toDyn True), Just (toDyn False), Just (toDyn True), Just (toDyn False), Just (toDyn True), Just (toDyn False), Just (toDyn False)] do l <- mcmc iterated_game games return [makeHistogram 30 (Data.Vector.fromList $ map fst (take 5000 l)) "A's paranoia", makeHistogram 30 (Data.Vector.fromList $ map snd (take 5000 l)) "B's paranoia"]
  63. 63. MORE STRATEGIES allCooperate :: Trust -> Bool -> Bool -> Measure Bool allCooperate _ _ _ = conditioned $ bern 0.1 allDefect :: Trust -> Bool -> Bool -> Measure Bool allDefect _ _ _ = conditioned $ bern 0.9 grimTrigger :: Trust -> Bool -> Bool -> Measure Bool grimTrigger me True False = conditioned $ bern 0.9 grimTrigger me False False = conditioned $ bern 0.1 grimTrigger me _ True = conditioned $ bern 0.9
  64. 64. STRATEGY AS A RANDOM VARIABLE data SChoice = Tit | GrimTrigger | AllDefect | AllCooperate deriving (Eq, Ord, Enum, Typeable, Show) chooseStrategy :: SChoice -> Strategy chooseStrategy Tit = tit chooseStrategy AllDefect = allDefect chooseStrategy AllCooperate = allCooperate chooseStrategy GrimTrigger = grimTrigger strat :: Measure SChoice strat = unconditioned $ categorical [(AllCooperate, 0.25), (AllDefect, 0.25), (GrimTrigger, 0.25), (Tit, 0.25)]
  65. 65. LET’S PLAY ANOTHER GAME iterated_game2 :: Measure (SChoice, SChoice) iterated_game2 = do let a_initial = False let b_initial = False a <- unconditioned $ uniform 0 1 b <- unconditioned $ uniform 0 1 na <- strat let a_strat = chooseStrategy na nb <- strat let b_strat = chooseStrategy nb rounds <- replicateM 10 $ return (a, b) foldM_ (play a_strat b_strat) (a_initial, b_initial) rounds return (na, nb) do l <- mcmc iterated_game2 games return [makeDiscrete (map fst (take 1000 l)) "A strategy", makeDiscrete (map snd (take 1000 l)) "B strategy"]
  66. 66. WHO’S WHO?
  67. 67.  Probabilistic SIPD  Extensive form SIPD with signaling  And channels with decidable vs. heuristic recognisers  Coordination. Enough said.  System 1/System 2 conflict  Sentiment analysis → payoff data  Start small: the stroke is the smallest unit of interaction  Data where information about players is limited  IP flows  Anonymity networks  Signaling game about type: are two actors the same person? FUTURE WORK
  68. 68. QUESTIONS? @maradydd

Editor's Notes

  • This is mostly a talk about game theory, founded by John von Neumann and Oskar Morgenstern in 1944.

    Game theory is part of econ, which is way more than just macro/micro “where money goes”

    Weird that the study of decision-making is called “the dismal science,” though to be fair the more you look at the problem of allocating finite resources, the more hard truths you run up against about physics and human nature

    Game theory provides a framework for refining our decision-making models as more information about data’s structure comes in
  • “the circus” = social media

    I’m largely giving this talk because I’m tired of assholes being better at coordination than people who aren’t assholes.

    Keith Alexander is consulting for $600K/month on the grounds of some kind of behaviour analysis secret sauce. So, other people are thinking about these problems too.
  • Keep the Shannon/Weaver model of communication in your head: two endpoints communicating over a possibly noisy channel of finite bandwidth, who have to serialize their messages to the channel and parse incoming messages off the channel. Both serialization and parsing can produce errors.

    This isn’t really a langsec talk, but we’ll still be talking about boundaries of competence. In a signaling game, how much confidence you can have in the signal you received being the one that was transmitted depends on how reliably you can receive signals in the language of the channel – and how reliably the sender serializes them.

    We won’t be getting all that deeply into feedback loops, but if you know how they work, keep them in mind.

    I kinda lied about the only math you need being the ability to compare two numbers; it’ll help later in the talk if you can read first-order logic notation, but it’s not really necessary.
  • 1: I.e., effects on the environment.

    2: So important, they named a class of games after them.

    3: The quality of your data is really important here.

    4: Langsec won’t be making much of an appearance in this talk, but when all the agents are machines, it’s relevant. Who do you think is going to be driving all those automated exploit generators DARPA is soliciting? People? At first, maybe, but not for long. Drones are expensive and hard to build. More servers are not. And in any case, being able to tell where FLT matters and where it doesn’t is an important distinction. Decidable problems are priceless; for everything else there’s heuristics, and when those inevitably fail, there’s Mastercard.

    5: Game theory is the framework we’ll be building up this knowledge around, but we’ll be pulling from all the fields I mentioned earlier.
  • The four elements at the top are all you need to define a game.

    Strategies and equilibria are derived from the structure of the game you’re playing.
  • Behavior strategies and mixed strategies are functionally equivalent as long as the player has perfect recall. (Kuhn’s theorem) So behavior strategies are a bit more like how people act in real life.
  • First described in 1950 by Merrill Flood and Melvin Dresher

    Four payoffs: Temptation, for screwing the other guy, Reward, for cooperating, Punishment, for defecting, and Sucker, for being defected on.

    Because Reward > Punishment, mutual cooperation is better than mutual defection

    Because Temptation > Reward and Punishment > Sucker, defection is the dominant strategy for both agents

    It’s a dilemma because mutual cooperation is better than mutual defection, but at the *individual* level, defection is superior to cooperation.

  • Basically rock-paper-scissors but with only two options.

    There is no pure strategy that is a best response here, since what you always want is to choose the opposite of what your opponent picked.

  • Here, the mutually beneficial outcome is also the dominant outcome: there is no conflict between self-interest and mutual benefit. Still, it’s an interesting basis for a signaling game, since there’s still some incentive to screw the other guy.
  • The classic social cooperation game, originally described by Jean-Jacques Rousseau.

    Two pure-strategy equilibria: both cooperate or both defect. Cooperating is payoff dominant, defecting is risk dominant.

  • Chicken is more of an “anti-coordination game” – choosing the same action creates negative externalities, so you want to not coordinate
  • Proposed by John Maynard Smith and George Price in 1973 in Nature to describe conflict among animals over resources

    V is the value of the contested resource, C is the cost of getting into a fight

    Often considered as a signaling game – there’s a round of threatening each other before choosing their moves
  • Also known as “conflicting interest coordination”

    One partner wants to go to the opera, the other wants to go to the ball game, but they’d both rather be together than go to different events. They forgot which one to go to, each knows that the other forgot, and they can’t communicate. Where should each go?

    Two pure strategy equilibria: both opera or both football. But this is unfair, since one person consistently gets a higher payoff than the other.

    One mixed strategy: go to your preferred event with 60% probability. But this is inefficient, because players miscoordinate 52% of the time, so the expected utility is 1.2, which is worse than if either person always goes to their non-preferred event.
  • Types of games overlap in various ways

    Zero-sum: the gains/losses of all players balance out to zero. Matching Pennies is zero-sum; Prisoner’s Dilemma and Stag Hunt are non-zero sum.

    All zero-sum games are competitive; non-zero-sum games can be competitive or noncompetitive

    An action is just an action. There’s nothing inherently good or bad about choosing Heads or Tails in Matching Pennies; the morality of snitching in PD depends on your ethical framework around snitching, the morality of going off to hunt rabbits in Stag Hunt depends on whether you agreed to hunt a stag beforehand and how seriously you take keeping your word.

    As we go on, we’ll look at more complicated games – ones that go on longer, have more players, where players have uncertain information about each other, and even ones where the game being played changes form as the game goes on.
  • Cournot equilibrium: Antoine Augustin Cournot, 1838. He was talking about businesses, e.g. factories, but it generalises.

    Nash equilibrium: nobody can do better by changing their strategy. In the Prisoner’s Dilemma, this is clear: any player who wants to cooperate knows that the other guy can defect on him and screw him, so he’s better off defecting.

    A subgame is a subset of the tree of a game. In subgame perfect equilibrium, all subgames have a Nash equilibrium. Start at the outcomes, work backward, removing branches that involve a player making a non-optimal move.

    “Trembling hand” – i.e., you might miss and hit the big red button instead
  • Traditional game theory assumes that all agents are rational. But in the 1960s, Eric Berne looked at irrational games – the sorts of social games that people entice each other into for attention, sympathy, and other kinds of psychological payoffs, while hiding their true motives.

    Berne drops the assumption that players are driven by the most rational angels of their nature, and looks at the payoffs of ulterior-motive social games as ways for players to satisfy unmet emotional needs. So in effect we’re now considering players to have two sets of preferences that impact their decision-making: one that the rational System 2 uses when making considered decisions, one that the prerational System 1 uses when making quick heuristic decisions.
  • Humans are social animals. We all have biological drives to interact with other members of our species to some extent or another – and when that drive is demanding to be satisfied, an argument can serve the same purpose as a productive discussion or even a hug, if what a person is fundamentally looking for is external recognition that they exist.

    “Payoff” comes in the form of neurotransmitter activity. Berne didn’t go into that, and the imaging equipment we need to investigate this directly doesn’t exist yet, but we can black-box it (Skinner-box it?) with behaviorism: each player experiences some consequences from each interaction, as reinforcement or as punishment.

    Positive reinforcement – a rewarding stimulus (a chocolate, a kiss, &c)
    Negative reinforcement – removal of an aversive stimulus (eg when someone stops yelling at you)
    Positive punishment – an aversive stimulus
    Negative reinforcement – removal of a rewarding stimulus

    Berne identified stimulus hunger, recognition hunger, and structure hunger. Status hunger is probably a combination of the latter two.
  • Procedure: a series of complementary transactions toward some physical end.

    Operation: a set of transactions undertaken for a specific, stated purpose. If you ask explicitly for something, like reassurance or support, and you get it, that’s an operation.

    Ritual: “a stereotyped series of simple complementary transactions programmed by external social forces”

    Pastime: an iterated ritual, with state; can turn into status gaming (establishment of a “pecking order”)

    People spend a *lot* of time on pastimes – that’s why they’re called that. Facebook is largely a pastime for most people. So is Twitter. When different clusters’ pastimes collide, you get fireworks because pastimes have a ritual quality (jargon, signaling certain beliefs, &c) and people don’t know what pre-existing state they’re walking into.

    Game: “an ongoing series of complementary ulterior transactions progressing to a well-defined predictable outcome.” IOW, the initiator of the game has a goal in mind and isn’t being upfront about it. If you ask for reassurance and then turn that against the person, that’s a game.
  • Berne’s work is pretty heavily based in Freud; he’s got this parent/child/adult triad of “ego states”, and posits that people fall into authoritarian parent modes or contrarian child modes when they play power games with each other. It’s kind of a just-so story, so we’re not really going to get into it. But we will look at the roles that the context of various mind games establishes for the players.

    Since games are a series of complementary ulterior transactions, that means there’s turn-taking. Each move is considered to be a stroke, i.e., something that affects the other player in some way.

    Advantages ~ payoffs.
    Existential advantage is that sense that events in the world are confirming your beliefs about how the world works, even if you manipulated the events to that end.
    Emotional payoff here is analogous to positive reinforcement, external psychological advantage is analogous to negative reinforcement. If you win the game, you’re raising the likelihood that you’ll behave that way again, because you’ve reinforced the evidence that playing games works.
    Internal and external social advantage are about status and limiting other players’ moves. If you signal as “oppressed”, people who prioritize oppression will limit what they do on your behalf.

  • “Ain’t It Awful” taken to the pathological extreme manifests as things like Munchausen syndrome or M-by-proxy

    In “Why Don’t You – Yes But”, the initiator really wants reassurance that their problem is not their fault, but they get it manipulatively by challenging people to present solutions they can’t find fault with. Obviously they can nitpick anything to death.

    “Courtroom” – pick a victim/scapegoat and pick them apart, most effectively in front of a “jury of their peers”
  • Introduce the idea of changing the game here – the mark thinks it’s one game (the one where if he wins he gets laid at the end), but what he doesn’t know is that he’s playing a different game (the one where if he wins he doesn’t get beaten up but does lose his wallet).

    Can be played with just a victim and an aggressor, as long as the victim does something that the aggressor can construe as the victim screwing up in some way

    Confederate lures the victim into provoking the aggressor.
  • Often about getting the target to embarrass themselves in some way – typically by overreacting and saying something they’ll regret later. (I’m doubtful as to whether the target ever does actually regret it later, but we’ll set that aside for now.)

    Berne talks about there being an “apology->forgiveness” phase of the game, though trolls really aren’t in it for the forgiveness. So this might be better considered a modification.

    Note that a troll’s actions revolve around sending signals to some receiver in an attempt to provoke an overreaction. Engaging is therefore a feedback loop providing the troll with more material to feed into its signal generation function. Proceed with caution.

    And on that note, let’s take a closer look at the class of games that we can use to model interactions involving two-way communication: signaling games.
  • Get it out of your system now, because you’re going to hear “balls” more often than any other noun in the clips that follow. I counted.
  • This is the beginning of an extensive form game tree for this game.

    The unfilled dot in the center is the root. It indicates who makes the first move – in this case player 1.

    Traditionally the first move is made by “Nature” and is taken to be the type of the player – in a job interview, whether the candidate being interviewed is competent or incompetent; when you buy someone a drink, whether they’re interested in you or not interested in you; when you’re deciding whether to tell someone a secret, whether they’re trustworthy or untrustworthy.

    But since player 1 has already decided whether he’s going to split or steal, he’s making the first move.
  • Similar to Prisoner’s Dilemma, except that if you decide to screw each other, you both get screwed just as badly as you would if you cooperated but the other guy defected. Being a sucker isn’t any worse for you – materially, at least – than betting you can screw the other guy and being wrong.
  • Poll the audience after this segment is over. What do they think Ibrahim will pick? What do they think Nick will pick?

    Radiolab interviewed both these guys after the show. In the studio, the argument went on for 45 minutes and the audience was booing Nick over and over again. He stuck to his guns the whole time, so in uncompressed time, his signal was fairly unambiguous.
  • We don’t know whether Nick has actually chosen Split or Steal at this point. He’s signaled unambiguously that he plans to steal, which means that if Ibrahim decides his signal is credible, Ibrahim can only operate on the lower right quadrant of the graph.

    At this point, Nick’s signal has changed the structure of the game they’re playing: it’s no longer Friend-or-Foe, it’s Ultimatum. <stuff about Ultimatum here> So the risk Nick is taking now is whether Ibrahim will decide that the ultimatum is so insulting that he should punish Nick by forcing them both to go home with nothing, or whether the promise of £6800 after the show is a credible enough incentive that he should cooperate.

    Takeaway: extensive form helps you see how a game’s structure changes as branches of the decision tree are pruned away
  • Axelrod’s initial tournaments just played strategies against each other 200x and totaled up points at the end. In ecological (or evolutionary) tournaments, each strategy’s success in the previous round determines how prevalent it is in the current round – and cooperative strategies outcompeted non-cooperative ones.

    It would be really great if players in the real world abandoned bad strategies as soon as they recognised the strategies weren’t working, but in practice people are actually pretty bad at recognising this. People are unusually invested in the strategies they choose. Confirmation bias, choice-supportive bias, &c.

    Complex inferences just didn’t work very well – the inferences were usually wrong.
  • In Axelrod’s IPD, success – i.e., doing the best you can possibly do – requires a strategy that satisfies all these properties. Such strategies also outcompete strategies that don’t satisfy these properties.

    But can we do better than an eye for an eye and a tooth for a tooth? Certainly in the real world there are plenty of people whose modus operandi is moving from victim to victim, opportunistically defecting whenever they think they can get away with it; and remember Berne’s games. Are there strategies that can incorporate other information to expose social predators?
  • c is the column player, r is the row player (ie you); p is the last round, b() is a predecessor function

    TFT: “Defect on them if they defected on me last round.”

    TFTT: “Defect on them if they defected on me last round and the round before.”

    Grim: “Defect on them if they ever defected on me in the past.”

    Bully: “Defect on them if they’ve *never* defected on me in the past.” Spiteful-Bully similar, but also defects if it’s been defected on 3x

    Vigilante: “Defect on them if they defected on anyone else last round.”

    Police: “Defect on them if they defected on me last round, or if last round they defected on someone who had just cooperated with everyone.”
    Vigilante and Police are peacekeeping strategies: they ignore who someone defected on, only care that they did it
  • All individually nice strategies are communally nice, but not necessarily vice versa. All individually forgiving strategies are communally forgiving, and all communally retaliatory strategies are individually retaliatory.

    Individually retaliatory: defects on someone who defects on it.
    Communally retaliatory: defects on someone who defects on anyone.
    Individually forgiving: stops defecting on someone who stops defecting on it
    Communally forgiving: stops defecting on someone who stops defecting on everyone

    TFT is loyal; if it plays another TFT, they’ll cooperate forever. Same for Police, but Vigilante is not loyal – Vigilantes will defect on other Vigilantes. TFT is individually nice, retaliatory and forgiving; Vigilante is communally nice, retaliatory and forgiving.
  • Absolutist: “Defect on c iff c has ever cooperated with someone when you defected, or vice versa.”

    Absolutist is loyal: it doesn’t defect on other Absolutists of its own kind. Note that if you put two groups of Absolutists into a population, they’ll defect on each other.

    It’s also unforgiving: it never stops defecting on someone once it’s started, like Grim.

    Neither individually nice nor communally nice, since it will defect on All-C (cooperated in the past with a defector)

    Really only works when there’s no noise in players’ information or actions
  • The frequentist perspective operates under the assumption that the long-term absolute probability of an event occurring can be known.
    The Bayesian interpretation is a subjective one, depending entirely on the information available to the agent.

    For a large enough number of samples – as evidence accumulates – the Bayesian and frequentist interpretations typically converge. But you don’t always have all that many samples to choose from.

    Really big data problems can be solved by frequentist analysis. But for medium-sized data and really small data, Bayesian analysis performs much better.

    A is the parameters, X is the evidence.
    P(A): prior probability of A. A belief, i.e., a measure of confidence.
    P(A|X): posterior probability of A, given X – the conditional probability of A, based on evidence X.
    P(X|A): posterior probability of X, given A – the likelihood, or the probability of the evidence given the parameters.
    (Avoiding the post hoc ergo propter hoc fallacy, statistically.)
    P(X) decomposes to P(X|A)P(A) + P(X|~A)P(~A): the probability that X occurs whether A happens or not
  • Probability mass function: gives the probability that a discrete random variable has some particular value
    Poisson is basically the bell curve for discrete outcomes; binomial gives the probability of an event occurring over N trials given probability p that it occurs in one trial; Bernoulli is binomial with one trial.

    Expected value of Z in the Poisson distribution is equal to its parameter, lambda; in the exponential distribution, it’s equal to the inverse of the parameter.

    Probability density function: gives the probability that a continuous random variable has some particular value; for a range, take the integral of the variable’s density over that range.

    All that we see is Z. We have to estimate lambda, and that’s why Bayesian analysis is useful: it gives us useful tools for updating our beliefs about lambda even though we can’t see it.

    Figuring out the right distribution to use with your data is important. There are a lot of them, useful in different situations, and that’s outside the scope of this talk.
  • We’re treating “input” here as anything that influences the value of a variable. Deterministic entails decidability.
  • So you’ve got some data! What are you going to do with it?
    Questions to ask yourself when modeling:
    What am I interested in?
    What does it look like?
    What influences it?

    Data conditions the values of random variables: the conditional distribution of Y given X is the probability distribution of Y when X is known to be a particular value.

    You can keep on assigning distributions to parameters as long as it’s useful, but if you don’t have any strong beliefs about a parameter, this is probably not useful. Pick an average value and let inference update it for you. Or you can also use a uniform distribution for it, and infer what its value is likely to be. It’s just another prior, after all.

    Monte Carlo simulation: also discovered by John Von Neumann. In normal MC, variables are independent and identically distributed; sample and average. in MCMC, variables can condition each other, conditioning defines the chain. When you combine probabilities, you’re reducing the effective volume of your search space; MCMC helps you narrow the search to the areas where you’re likely to find values that satisfy the data and the conditions.
  • With this definition, the payoffs are completely hidden; all we assume is that the players consider some actions to be “cooperating” and others to be “defecting,” and that whether they consider an action to be cooperative or defecting is conditioned on how trusting they are. In this case, a higher value means “more paranoid.”

    If the other player defects on them (the True case), then the probability distribution of this player defecting is a Bernoulli distribution with p = 0.9 – this parameter could have been a random variable as well, but for this toy example we’re fixing its value.

    If the other player cooperates, then the probability that this player defects is also a Bernoulli distribution, with p = whatever the player’s paranoia is.
  • Here, a and b are a’s and b’s paranoia values; we don’t know what they are, we just know that they’re chosen uniformly from values between 0 and 1, inclusive.

    When we sample hypothetical games with these players, each game will last 10 rounds. The actions sampled will converge on the strategy we defined on the last slide – defecting based on whether the other player defected the last round, conditioned by how paranoid this player is – and from the values we observe in the samples after Markov chain convergence (hopefully!), we can get a better estimate of how paranoid A and B are.
  • For Grim Trigger, the fact that we’ve defected on a previous round tells us that we should continue to defect on that person. Note that we’re not making this conditional on paranoia.

  • Probabilistic SIPD: How large of a sample do we actually need to infer a player’s strategy?

    Inference about System 1 vs. System 2 influencing a player’s actions will require modeling the preferences and strategies of each system separately, and modeling how they interact
  • ×