Survey of computational complexity or computability of sequential decision making (games, planning)
contains two more detailed proofs:
- EXPSPACE completeness of unobservable adversarial planning for existence of 100% winning strategy (Hasslum et al)
- undecidability of unobservable adversarial planning for arbitrary winning rate (including optimal play in the Nash sense)
Undecidability in partially observable deterministic gamesOlivier Teytaud
Undecidability in partially observable deterministic games
Presented in Dagstuhl 2010
Accepted in IJFCS:
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
Introductory talk
more technicities in
@inproceedings{schoenauer:inria-00625855,
hal_id = {inria-00625855},
url = {http://hal.inria.fr/inria-00625855},
title = {{A Rigorous Runtime Analysis for Quasi-Random Restarts and Decreasing Stepsize}},
author = {Schoenauer, Marc and Teytaud, Fabien and Teytaud, Olivier},
abstract = {{Multi-Modal Optimization (MMO) is ubiquitous in engineer- ing, machine learning and artificial intelligence applications. Many algo- rithms have been proposed for multimodal optimization, and many of them are based on restart strategies. However, only few works address the issue of initialization in restarts. Furthermore, very few comparisons have been done, between different MMO algorithms, and against simple baseline methods. This paper proposes an analysis of restart strategies, and provides a restart strategy for any local search algorithm for which theoretical guarantees are derived. This restart strategy is to decrease some 'step-size', rather than to increase the population size, and it uses quasi-random initialization, that leads to a rigorous proof of improve- ment with respect to random restarts or restarts with constant initial step-size. Furthermore, when this strategy encapsulates a (1+1)-ES with 1/5th adaptation rule, the resulting algorithm outperforms state of the art MMO algorithms while being computationally faster.}},
language = {Anglais},
affiliation = {TAO - INRIA Saclay - Ile de France , Microsoft Research - Inria Joint Centre - MSR - INRIA , Laboratoire de Recherche en Informatique - LRI},
booktitle = {{Artificial Evolution}},
address = {Angers, France},
audience = {internationale },
year = {2011},
month = Oct,
pdf = {http://hal.inria.fr/inria-00625855/PDF/qrrsEA.pdf},
}
Undecidability in partially observable deterministic gamesOlivier Teytaud
Undecidability in partially observable deterministic games
Presented in Dagstuhl 2010
Accepted in IJFCS:
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
Introductory talk
more technicities in
@inproceedings{schoenauer:inria-00625855,
hal_id = {inria-00625855},
url = {http://hal.inria.fr/inria-00625855},
title = {{A Rigorous Runtime Analysis for Quasi-Random Restarts and Decreasing Stepsize}},
author = {Schoenauer, Marc and Teytaud, Fabien and Teytaud, Olivier},
abstract = {{Multi-Modal Optimization (MMO) is ubiquitous in engineer- ing, machine learning and artificial intelligence applications. Many algo- rithms have been proposed for multimodal optimization, and many of them are based on restart strategies. However, only few works address the issue of initialization in restarts. Furthermore, very few comparisons have been done, between different MMO algorithms, and against simple baseline methods. This paper proposes an analysis of restart strategies, and provides a restart strategy for any local search algorithm for which theoretical guarantees are derived. This restart strategy is to decrease some 'step-size', rather than to increase the population size, and it uses quasi-random initialization, that leads to a rigorous proof of improve- ment with respect to random restarts or restarts with constant initial step-size. Furthermore, when this strategy encapsulates a (1+1)-ES with 1/5th adaptation rule, the resulting algorithm outperforms state of the art MMO algorithms while being computationally faster.}},
language = {Anglais},
affiliation = {TAO - INRIA Saclay - Ile de France , Microsoft Research - Inria Joint Centre - MSR - INRIA , Laboratoire de Recherche en Informatique - LRI},
booktitle = {{Artificial Evolution}},
address = {Angers, France},
audience = {internationale },
year = {2011},
month = Oct,
pdf = {http://hal.inria.fr/inria-00625855/PDF/qrrsEA.pdf},
}
Weather, opponents, geopolitics: so many uncertainties in such a case ? How to manage power systems in spite of these uncertainties, and how to decide investments.
Talk at Saint-Etienne in 2015; thanks to R. Leriche and to the "games and optimizations" days in Saint-Etienne.
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
This talk describes two key machine learning algorithms, namely MCTS and Deep Networks (DN), presented as the main AI innovations from the last 20 years. Interestingly, the talk was given a few days before a combination MCTS+DN was used by Google DeepMind for winning against a pro (https://docs.google.com/document/d/1ZjniEJiotdCfvBYI3MTBpjtOTSlUvf3ma7V8DHVmjhk/edit#)
Talk ENS-Lyon at "Sept Laux"
Combining UCT and Constraint Satisfaction Problems for MinesweeperOlivier Teytaud
@inproceedings{buffet:hal-00750577,
hal_id = {hal-00750577},
url = {http://hal.inria.fr/hal-00750577},
title = {{Optimistic Heuristics for MineSweeper}},
author = {Buffet, Olivier and Lee, Chang-Shing and Lin, Woanting and Teytaud, Olivier},
abstract = {{We present a combination of Upper Con dence Tree (UCT) and domain speci c solvers, aimed at improving the behavior of UCT for long term aspects of a problem. Results improve the state of the art, combining top performance on small boards (where UCT is the state of the art) and on big boards (where variants of CSP rule).}},
language = {Anglais},
affiliation = {MAIA - INRIA Nancy - Grand Est / LORIA , Department of Computer Science and Information Engineering - CSIE , National University of Tainan - NUTN , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , Department of Electrical Engineering and Computer Science - Institut Montefiore},
booktitle = {{International Computer Symposium}},
address = {Hualien, Ta{\"\i}wan, Province De Chine},
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00750577/PDF/mines3.pdf},
}
An overview on best-response mechanisms (Nisan, Schapira, Zohar, Valiant 2011) and their applications to BGP and other protocols. Also a probabilistic extension in which players can make mistakes (Ferraioli-Penna 2013).
Recurrent Neuron Network-from point of dynamic system & state machineGAYO3
General explanation of various recurrent framework and intuitions behind it.
Part 1: Focus on series sampling from continuous time.
Part 2: Explain the connection between state machine and language. And some ideas of NLP.
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
This talk describes two key machine learning algorithms, namely MCTS and Deep Networks (DN), presented as the main AI innovations from the last 20 years. Interestingly, the talk was given a few days before a combination MCTS+DN was used by Google DeepMind for winning against a pro (https://docs.google.com/document/d/1ZjniEJiotdCfvBYI3MTBpjtOTSlUvf3ma7V8DHVmjhk/edit#)
Talk ENS-Lyon at "Sept Laux"
Presentation by Stefan Dziembowski, associate professor and leader of Cryptology and Data Security Group University of Warsaw. In BIU workshop on Bitcoin. Covered exclusively by vpnMentor.com
Presentation by Stefan Dziembowski, associate professor and leader of Cryptology and Data Security Group University of Warsaw. In BIU workshop on Bitcoin. Covered exclusively by vpnMentor.com
Weather, opponents, geopolitics: so many uncertainties in such a case ? How to manage power systems in spite of these uncertainties, and how to decide investments.
Talk at Saint-Etienne in 2015; thanks to R. Leriche and to the "games and optimizations" days in Saint-Etienne.
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
This talk describes two key machine learning algorithms, namely MCTS and Deep Networks (DN), presented as the main AI innovations from the last 20 years. Interestingly, the talk was given a few days before a combination MCTS+DN was used by Google DeepMind for winning against a pro (https://docs.google.com/document/d/1ZjniEJiotdCfvBYI3MTBpjtOTSlUvf3ma7V8DHVmjhk/edit#)
Talk ENS-Lyon at "Sept Laux"
Combining UCT and Constraint Satisfaction Problems for MinesweeperOlivier Teytaud
@inproceedings{buffet:hal-00750577,
hal_id = {hal-00750577},
url = {http://hal.inria.fr/hal-00750577},
title = {{Optimistic Heuristics for MineSweeper}},
author = {Buffet, Olivier and Lee, Chang-Shing and Lin, Woanting and Teytaud, Olivier},
abstract = {{We present a combination of Upper Con dence Tree (UCT) and domain speci c solvers, aimed at improving the behavior of UCT for long term aspects of a problem. Results improve the state of the art, combining top performance on small boards (where UCT is the state of the art) and on big boards (where variants of CSP rule).}},
language = {Anglais},
affiliation = {MAIA - INRIA Nancy - Grand Est / LORIA , Department of Computer Science and Information Engineering - CSIE , National University of Tainan - NUTN , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , Department of Electrical Engineering and Computer Science - Institut Montefiore},
booktitle = {{International Computer Symposium}},
address = {Hualien, Ta{\"\i}wan, Province De Chine},
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00750577/PDF/mines3.pdf},
}
An overview on best-response mechanisms (Nisan, Schapira, Zohar, Valiant 2011) and their applications to BGP and other protocols. Also a probabilistic extension in which players can make mistakes (Ferraioli-Penna 2013).
Recurrent Neuron Network-from point of dynamic system & state machineGAYO3
General explanation of various recurrent framework and intuitions behind it.
Part 1: Focus on series sampling from continuous time.
Part 2: Explain the connection between state machine and language. And some ideas of NLP.
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
This talk describes two key machine learning algorithms, namely MCTS and Deep Networks (DN), presented as the main AI innovations from the last 20 years. Interestingly, the talk was given a few days before a combination MCTS+DN was used by Google DeepMind for winning against a pro (https://docs.google.com/document/d/1ZjniEJiotdCfvBYI3MTBpjtOTSlUvf3ma7V8DHVmjhk/edit#)
Talk ENS-Lyon at "Sept Laux"
Presentation by Stefan Dziembowski, associate professor and leader of Cryptology and Data Security Group University of Warsaw. In BIU workshop on Bitcoin. Covered exclusively by vpnMentor.com
Presentation by Stefan Dziembowski, associate professor and leader of Cryptology and Data Security Group University of Warsaw. In BIU workshop on Bitcoin. Covered exclusively by vpnMentor.com
The slide covers a few state of the art models of word embedding and deep explanation on algorithms for approximation of softmax function in language models.
This file contains the contents about dynamic programming, greedy approach, graph algorithm, spanning tree concepts, backtracking and branch and bound approach.
Theory of games, with a short reminder of computational complexity and an independent appendix on human complexity and the game of Go
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
The slides for the equation deviation of recurrent neural network (RNN), back-propagation through time and Sequence-to-sequence (Seq2Seq) models in image/video captioning tasks. Used in group paper reading in University of Sydney.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Complexity of planning and games with partial information
1. Sequential decision making:
decidability and complexity
Searching with partial
observation
Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.
Paris-Sud, LRI, Taiwan universities (including NUTN), CITINES project
TAO, Inria-Saclay IDF, Cnrs 8623,
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence.
Bielefeld
September 2012.
2. A quite general model
A directed graph (finite).
A Starting point on the graph, a target (or
several targets, with different rewards).
I want to reach a target.
Labels(=decisions) on edges:
Next node = f( current node, decision)
Each node is either:
- random node (random decision).
- decision node (I choose a decision)
- opponent node (an opponent chooses)
3. Partial observation
Each decision node
is equipped with an observation;
you can make decisions using
the list of past observations
==> you don't know
where you are in the graph
4. Overview
● 10%: overview of Alternating Turing
machine & computational complexity
(great tool for complexity upper bounds)
● 50%: general culture on games
(including undecidability)
● 35%: general culture on fictitious play
(matrix games) (probably no time for this...)
● 4%: my results on that stuff
==> 2 detailed proofs (one new)
==> feel free of interrupting
5. Outline
● Complexity and ATM
● Complexity and games (incl. planning)
● Bounded horizon games
7. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “for all” states (i.e. several
transitions, accepts if all transitions
accept)
● Co-NTM: TM with “exists” states (i.e.
several transitions, accepts if at least one
transition accepts)
● ATM: TM with both “exists” and “for all”
states.
8. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “exists” states (i.e.
several transitions, accepts if at least one
transition accepts)
● ATM: TM with both “exists” and “for all”
states.
9. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “for all” states (i.e.
several transitions, accepts if all lead to
accept)
● ATM: TM with both “exists” and “for all”
states.
10. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “for all” states (i.e.
several transitions, accepts if all lead to
accept)
● ATM: TM with both “exists” and “for all”
states.
13. Outline
● Complexity and ATM
● Complexity and games (incl.
planning)
● Bounded horizon games
14. Computational complexity:
framework
Uncertainty can be:
– Adversarial: I focus on worst case
– Stochastic: I focus on average result
– Or both.
“Stochastic = adversarial” if goal = 100%
success.
“Stochastic != adversarial” in the general case.
15. Computational complexity:
framework
Many representations for problems. E.g.:
– Succinct: a circuit computes the ith bit of
the proba that action a leads to a
transition from s to s'
– Compressed: a circuit computes many bits
simultaneously
– Flat: longer encoding (transition tables)
==> does not matter for decidability
==> matters for complexity
16. Computational complexity:
framework
Many representations for problems. E.g.:
– Succinct
– Compressed
– Flat
Compressed representation “somehow” natural
(state space has exponential size, transitions
are fast): see e.g. Mundhenk for detailed defs
and flat representations.
17. Computational complexity:
framework
We use mainly compressed representation; see
also Mundhenk for flat representations.
Typically, exponentially small representations
lead to exponentially higher complexity
==> but it's not always the case...
Simple things can change a lot the complexity:
“superko”: rules forbid twice the same position;
some fully observable 2Player games become
EXPSPACE instead of EXP ==> discussed later
18. Computational complexity: framework
for first tables of results
Either search (find a target)
or optimize (cumulate rewards over time)
Compressed (written with circuits or others...)
or not (flat).
Horizon:
- Short horizon: horizon ≤ size of input
- Long horizon: log2(horizon) ≤ size of input
- Infinite horizon: no limit
20. Mundhenk's summary: one player, non-negative
reward, looking for non-neg. average reward
(= positive proba of reaching): easier
21. Complexity, partial observation, infinite
horizon, proba of reaching a target
● 1P+random, unobservable: undecidable
(Madani et al)
● 1P+random, P(win=1),
or equivalently 2P, P(win=1):
[Rintanen and refs therein]
– Fully observable: EXP [Littman94]
– Unobservable: EXPSPACE [Hasslum et al 2000]
– Partial observability: 2EXP [rintanen, 2003]
Rmk: “2P, P(win=1)” is not “2P”!
22. Complexity, partial observation,
infinite horizon
● 2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine]
● 2P (random or not):
– Existence of sure win: equiv. to 1P+random !
● EXP full-observable (e.g. Go, Robson 1984)
● PSPACE unobservable
● 2EXP partially observable
– Existence of sure win, same state forbidden:
EXPSPACE-complete (Go with Chinese rules ?
rather conjectured EXPTIME or PSPACE...)
– General case (optimal play): undecidable
(Auger, Teytaud) (what about phantom-Go ?)
23. Complexity, partial observation
Remarks:
● Continuous case ?
● Purely epistemic (we gather information, we
don't change the state) ? [Sabbadin et al]
● Restrictions on the policy, on the set of
actions...
● Discounted reward
● DEC-POMDP, POSG : many players,
same/opposite/different reward functions...
24. What are the approaches ?
– Dynamic programming (Massé – Bellman 50's) (still
the main approach in industry), alpha-beta, retrograde analysis
– Reinforcement learning
– MCTS (R. Coulom. Efficient Selectivity and Backup
Operators in Monte-Carlo Tree Search. In
Proceedings of the 5th International Conference on
Computers and Games, Turin, Italy, 2006)
– Scripts + Tuning / Direct Policy Search
– Coevolution
All have their PO extensions but the two last
are the most convenient in this case.
25. Partially observable games
Many tools for fully observable games.
Not so many for partially observable ones.
● Shi-Fu-Mi (Rock Paper Scissor)
● Card games
● Phantom games
26. Shi-Fu-Mi (Rock-Paper-Scissors)
● Fully observable in simultaneous play, but
partially observable in turn-based version.
● Computers stronger than humans (yes, it's
true).
27. Card games, phantom games
● Phantomized version of a game:
– You don't see the move of your opponents
– If you play an illegal move, you are
informed that it's illegal, you play again
– Usually, you get a few more information
(captures, threats...) <== game-dependent
● Phantom-games:
– phantom-Chess = Kriegspiel
==> Dark Chess: more info
– phantom-Go
– etc.
28. Partially observable games
● Usually quite heuristic algorithms
● Best performing algorithms combine:
– Opponent modelling (as for Shi-Fu-Mi)
– Belief state (often by Monte-Carlo
simulations)
– Not a lot of tree search
– A lot of tuning
==> usually no consistency analysis
29. Part I: Complexity analysis
(unbounded horizon)
– Game:
● One or two players
● Win, loss, draw (incl. endless loop)
– Partial observability, no random part
– Finite state space:
● state=transition(state,action)
● action decided by each player in turn
30. State of the art
- makes sense in fully observable games
- not so much in non-observable games
31. State of the art
EXPTIME-complete in the general
fully-observable case
32. EXPTIME-complete fully
observable games
- Chess (for some nxn generalization)
- Go (with no superko)
- Draughts (international or english)
- Chinese checkers
- Shogi
33. PSPACE-complete fully
observable games
- Amazons
- Hex polynomial horizon
- Go-moku +
- Connect-6 full observation
- Qubic ==> PSPACE
- Reversi
- Tic-Tac-Toe
Many games with filling of each cell once and only once
34. EXPSPACE-complete
unobservable games (Hasslun & Jonnsson)
The two-player unobservable case is
EXPSPACE-complete
(games in succinct form, infinite horizon).
(still for 100%win “UD” criterion -
for not fully observable cases it
is necessary to be precise...)
Importantly, the UD criterion means that strategies
are the same if the opponent has full observation
as if he has no observation ==> UD is very bad :-(
35. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF:
(I) First note that strategies are just sequences of actions
(no observability!)
(II) It is in EXPSPACE=NEXPSPACE, because of the
following algorithm:
(a) Non-deterministically choose the sequence of
Actions
(b) Check the result against all possible strategies
(III) We have to check the hardness only.
36. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF:
(I) First note that strategies are just sequences of actions
(no observability!)
(II) It is in EXPSPACE=NEXPSPACE, because of the
following algorithm:
(a) Non-deterministically choose the sequence of
actions (exponential list of actions is enough...)
(b) Check the result against all possible strategies
(III) We have to check the hardness only.
37. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF:
(I) First note that strategies are just sequences of actions
(no observability!)
(II) It is in EXPSPACE=NEXPSPACE, because of the
following algorithm:
(a) Non-deterministically choose the sequence of
actions
(b) Check the result against all possible strategies
(III) We have to check the hardness only.
38. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF of the hardness:
Reduction to: is my TM with exponential tape
going to halt ?
Consider a TM with tape of size N=2^n.
We must find a game
- with size n ( n= log2(N) )
- such that the first player has a winning
strategy for player 1 iff the TM halts.
39. EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
n o b s e a b l e g a m e with a s l u n of n n s n
as a game with state O(log(N))
Player 1 chooses the sequence of
configurations of the tape (N=4):
x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
x(1,1),x(1,2),x(1,3),x(1,4)
x(2,1),x(2,2),x(2,3),x(2,4)
x(3,1),x(3,2),x(3,3),x(3,4)
.....................................
40. EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
n o b s e a b l e g a m e with a s l u n of n n s n
as a game with state O(log(N))
Player 1 chooses the sequence of
configurations of the tape (N=4):
x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
x(1,1),x(1,2),x(1,3),x(1,4)
x(2,1),x(2,2),x(2,3),x(2,4)
x(3,1),x(3,2),x(3,3),x(3,4)
.....................................
x(N,1), x(N,2), x(N,3), x(N,4)
Wins by
final state !
41. EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
n o b s e a b l e g a m e with a s l u n of n n s n
as a game with state O(log(N))
Player 1 chooses the sequence of
configurations of the tape (N=4):
x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
x(1,1),x(1,2),x(1,3),x(1,4)
x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an
x(3,1),x(3,2),x(3,3),x(3,4) illegal transition!
..................................... ==> P2 can check the
x(N,1), x(N,2), x(N,3), x(N,4)
consistency of one 3-uple per line
Wins by ==> requests space log(N)
final state ! ( = position of the 3-uple)
42. EXPSPACE-complete
unobservable games
The 1P+unknown initial state in the
unobservable case is
EXPSPACE-complete
(games in succinct form).
2P+unobservable as well.
43. 2EXPTIME-complete PO games
The two-player PO case,
or 1P+random PO is
2EXP-complete
(games in succinct form).
(2P = 1P+random because of UD)
44. Undecidable games (B. Hearn)
The three-player PO case is
undecidable. (two players against one,
not allowed to communicate)
45. Hummm ?
Do you know a PO game in which you can
ensure a win with probability 1 ?
46. Another formalization
c
==> much more satisfactory
(might have drawbacks as well...)
47. Madani et al.
c
1 player + random = undecidable
(even without opponent!)
48. Madani et al.
1 player + random = undecidable.
==> answers a (related) question by
Papadimitriou and Tsitsiklis.
Proof ?
Based on the emptiness problem for
probabilistic finite automata (see Paz 71):
Given a probabilistic finite automaton,
is there a word accepted with proba at least c ?
==> undecidable
53. A random node to be rewritten
Rewritten as follows:
● Player 1 chooses a in [[0,N-1]]
● Player 2 chooses b in [[0,N-1]]
● c=(a+b) modulo N
● Go to tc
Each player can force the game to be equivalent to
the initial one (by playing uniformly)
==> the proba of winning for player 1 (in case of perfect play)
is the same as for for the initial game
==> undecidability!
54. Important remark
Existence of a strategy for winning with
proba 0.5 = also undecidable for the
restriction to games in which the proba
is >0.6 or <0.4 ==> not just a subtle
precision trouble.
55. So what ?
We have seen that
unbounded horizon
+ partial observability
+ natural criterion (not sure win)
==> undecidability
contrarily to what is expected from usual definitions.
What about bounded horizon, 2P ?
– Clearly decidable
– Complexity ?
– Algorithms ? (==> coevolution & LP)
57. Part II: Fictitious play (bounded
horizon) in the antagonist case
Fictitious play ?
Somehow an abstract version of
antagonist coevolution with full memory
● illimited population (finite, but
increasing): one more indiv. per iteration
● perfect choice of each mutation against
the current population of opponents
58. Part II: Fictitious play in the
zero-sum case
Why zero-sum cases ?
Evolutionary stable solutions (found by
FP) are usually sub-optimal (as well as nature,
for choosing lion's strategies or cheating behaviors in Scaly-
breasted Munia)
59. What is a matrix 0-sum game ?
● A matrix M is given (type n x m).
● Player 1 chooses (privately) i in [[1,n]]
● Player 2 chooses j in [[1,n]]
● Reward
= Mij for player 1
= -Mij for player 2 (zero-sum game)
==> Model for finite antagonist games
60. Nash equilibrium
● Nash equilibrium: there is a distribution
of probability for each player
(= mixed strategy)
such that the reward is optimum (for the
worst case on the distribution of
probabilities by the opponent)
● Linear programming is a polynomial
algorithm for finding the Nash eq.
● FP= tool for approximating it
(at least in 0-sum cases)
61. Fictitious play (Brown 1949)
● Each player starts with a distribution on
its strategies
● Each player in turn:
– Finds an optimal strategy against the
current opponent's distribution (randomly
break ties)
– Adds it to its distribution (the distrib. does
not sum to 1!)
67. Improvements for KxK matrix
game: approximations
● There exists approximations in size
O(log(K)/2) [Althoefer]
● Such an approximation can be found in
time O(Klog K / 2) [Grigoriadis et al]: basically a
stochastic FP
68. Improvements for KxK matrix
game: exact solution if k-sparse
● There exists approximations in size
O(log(K)/2) [Althoefer]
● Such an approximation can be found in
time O(Klog K / 2) [Grigoriadis et al]: basically a
stochastic FP
69. Improvements for KxK matrix
game: approximations
● There exists approximations in size
O(log(K)/2) [Althoefer]
● Such an approximation can be found in
time O(Klog K / 2) [Grigoriadis et al]: basically a
stochastic FP
● Exact solution in time (Auger, Ruette, Teytaud)
O (K log K · k 2k + poly(k) )
if solution k-sparse (good only if k
smaller than log(K)/log(log(K)) !
better ?)
70. Improvements for KxK matrix
game: approximations
So, LP & FP are two tools for matrix
games.
LP programming can be adapted to PO
games without building the complete
matrix (using information sets).
The same for FP variants ?
71. Conclusions
There are still natural questions which
provide nice decidability problems
Madani et al (1 player against random, no observability), extended here to
2 players with no random
==> undecidable problems “less than”
the Halting problem ?
Solving zero-sum matrix-games is still an
active area of research
● Approximate cases
● Sparse case
72. Open problems
● Phantom-Go undecidable ? (or other “real” game...)
● Complexity of Go with Chinese rules ?
(conjectured: PSPACE or EXPTIME;
proved PSPACE-hard + EXPSPACE)
● More to say about “epistemic” games (internal
state not modified)
● Frontier of undecidability in PO games ?
(100% halting game: 2P become decidable)
● Chess with finitely many pieces on infinite board:
decidability of forced-mate ?
(n-move: Brumleve et al, 2012, simulation in Presburger
(thanks S. Riis :-) )