A COGNITIVE GAME-PLAYING SYSTEM
FOR TABLETOP GAMES
Cognitive Systems Institute Group
Department of Computer Science ◇ Tetherless World Constellation
Rensselaer Polytechnic Institute, Troy, NY 12180
Thursday, 12th November, 2015
v 5th-year PhD student in Computer Science
v Supervised by Professor Jim Hendler since 2013
v Led RPI MiniDeepQA R&D team, Summer, 2013
v CS background is practical, not theoretical
v Programmer for 30+ years
v Developed games for 20+ years (including industry)
v Current research: using ‘cognitive computing’ for game AI
v “Cognitive game-playing” system, Aleph
v Future plan: build a Dungeons & Dragons-playing A.I. agent
v Lots of games!…
v Computers play some games well and some badly, but why?
v Arises from multiple aspects of the game design
v Data structure, rule complexity, blufﬁng, openness…
v Game theory deﬁnes other parameters for games
v Zero-/Non zero-sum, Deterministic/Stochastic, Impartial/Partisan…
v But IBM Watson won at Jeopardy!, a “humans-only” game
v Jeopardy! has a massive search space
v How did Watson manage it – with ≤3 seconds per question?
v Serious hardware (~2,800 IBM Power7 cores)
v More importantly…
Epstein et al.,“Making Watson Fast”. IBM J Res Dev 56 (3/4), May/July 2012, p. 15:2
v The ﬁrst ‘cognitive computing’-based tabletop game AI system
v Sadly not the ﬁrst game AI: that title has been claimed
u e.g. Dannenhauer & Muñoz-Avila (2013): “Case-based Goal Selection
Inspired by IBM’s Watson” (ICCBR 2013)
v Uses a pipeline-style approach to play games
v Iterative search is used only where necessary
v Different evaluation techniques are applied
u Absolute scoring (i.e. ‘what is my score for this move?’)
u Inﬂuence maps & stacks (similar to MLA)
u Simple strategic analysis & reﬂection
v … was inspired by the design of the DeepQA pipeline
v … is informed by consideration of how people play games
v … uses numerous tools (“evaluators”) to judge game state
u Evaluators correspond to the sections and subsections of the pipeline
PRIMARY GENERAL ANALYSIS
Where can I play?
Where can I not play?
What can I play?
SECONDARY GENERAL ANALYSIS
What is my score?
Can I win this turn?
Do I have any valuable tiles?
What is my position like?
What moves exist?
Do chains of moves exist?
PRIMARY MOVE SCORING
Will this advance my position?
What would my new score(s) be?
Who is winning?
What tile might come up next?
Can I disrupt a player’s game?
What happens if I play tile M?
Can I control more of the board?
How many tiles can I play now?
Can I swap hands? Should I do so?
Should I retain tile Q for later?
How can I use tile X best?
Does tile Y give me any benefit?
Can I perform combo move Z?
FINAL SCORING AND RANKING
Which move has the highest score?
What other moves score highly?
Which move gives me the highest score?
How well does this move fit my tactics?
Should I change my gameplay?
Is it worth playing a lesser move now?
v Developed using C-IMA, a partial implementation of UIMA
v Written in C++ using boost libraries
u Programmer familiarity
u Curiosity: can it be done?
v System constructed from pipelines and evaluators
v Pipelines and evaluators are equivalent to UIMA ﬂow models and
v Game test platform is kept largely separate
v Game logic is designed as a set of reentrant modules
v Game state data are stored in a speciﬁc container class
v Two primary techniques used currently
v Absolute score
v Inﬂuence map
v Absolute score
v “What is my score if I make this move?”
v Value-based score
v Inﬂuence map
v Modiﬁed form of multi-layer analysis
v Board representations which can be stacked for analysis
v Positional evaluation
v Frequently used in RTS games for
v Also used in go A.I.
v Each map represents a set of information; e.g.…
v What is my score if I play at location (x, y)?
v Number of players controlling location (p, q)
v Data are useful individually
v Data are more useful in combination
v Individual layers can be formed into a stack for analysis
v Individual stack frame consists of:
v X, Y grid of cells
v Weight (importance of the set relative to the other data sets)
v Active ﬂag (determines if this layer is used in computation)
v Each cell consists of:
v Weight (importance of an individual cell within its own frame)
v Computation over stack columns generates result frame
v Result frame is an inﬂuence map which provides combined
information about the state of the game
v Needed to guide the agents’ play
v Otherwise the agents will play no better than randomly
v Traditionally provided by heuristic in search algorithm
v Minimax (or variant) generates scores for multiple plies
v Scores are propagated back up the search tree
v Branch leading to best/optimum remaining future move is chosen
u Greedy algorithm
v With games with complex search space, this is impractical
v Cannot search deeply enough forward for meaningful analysis
v “Deep thought” module performs analysis over evaluators
v Which evaluators work well or badly is still being determined
v Different strategies have some different inputs
v Heuristics are necessarily simple
v Canbe as simple as a set of if (...) statements
v Again, a matter of research to see what works well
v Aim is to provide the agent with a degree of self-reﬂection
v Ability to judge its own performance using provided criteria
v Based on results, the agent may elect to change its strategy
u Strategies are pre-programmed, not deduced by the agents during play
v Watson demonstrated the efﬁcacy of ‘cognitive computing’
v Aleph is the ﬁrst in a new kind of AI for tabletop games
v “Cognitive game-playing”
v Design is capable of playing extremely complex games
v Turn-by-turn strategic analysis guides the agents’ play
v Considerable scope for future development
v Improving Aleph to make it more challenging for human players
v Other games, e.g. go, Civilization, Magic: the Gathering, chess
v IBM Watson ‘cogs’ + “cognitive game-playing” = …?
u Dungeons & Dragons, maybe?...
I would like to thank my supervisor, Professor Jim Hendler, for his continued support and advice, and for taking a chance on a stranger with some crazy ideas and offering me the initial
opportunity to work with Watson. I would also like to thank Dr Chris Welty and Dr Siddharth Patwardhan for their assistance and insights which led semi-directly to this work, Dr Bijan Parsia
(University of Manchester, UK) for his timely intervention in asking difﬁcult questions which I had been avoiding, and Professor Selmer Bringsjord (RPI) for his consistently insightful comments
and observations.Additionally, sincere thanks are due to Dr Jonathan Dordick and Mr John Kolb (RPI) for their support, and to my other friends and colleagues at RPI likewise for theirs.