• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Labex2012g
 

Labex2012g

on

  • 173 views

 

Statistics

Views

Total Views
173
Views on SlideShare
173
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Labex2012g Labex2012g Presentation Transcript

    • DigiWorldDistributed decision making: partiallyobservable dynamic games andmultiobjective policy optimization Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.Paris-Sud, LRITAO, Inria-Saclay IDF, Cnrs 8623, In a nutshell:Lri, Univ. Paris-Sud,Digiteo Labs, PascalNetwork of Excellence. We optimize strategies, with parallel machines,DigiWorld and we test on games,September 2012. and we apply to energy.
    • Intro: so many words...DistributedDecision makingPartially observableDynamicGamesMultiobjectivePolicyOptimization
    • Lets explainDecision making + policyoptimization Decision making:Dynamic its all about making decisions.Games Humans in the loop, or not.Partially ObservableDistributedMultiobjective
    • Lets explainDecision making + policyoptimization Policy:Dynamic we provide policies.Games Its not graphical interfaces or data visualization, itsPartially Observable providing strategies.DistributedMultiobjective
    • Lets explainDecision making + policyoptimization Optimization:Dynamic its numerical.games We have objective functions,Partially Observable optimize. Its science, and we not astrology.DistributedMultiobjective
    • Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
    • Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
    • Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
    • Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
    • Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
    • Lets explain Games: we have rules, a system evolves according to theseDecision making + policyoptimization Yes, MineSweeper rules: is - Games: really important. Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
    • Lets explain Games: we have rules, a system evolves according to theseDecision Nearly nobody trusts an making + policy rules:optimization industrial experiment, - Games: (in particular if effectsgame of Go, Chess, are supposedDynamicto be a draughts (roughlyrisk but reduction of useless,games for horizon 50 years...). convincing and easy to experiment)Partially ObservableDistributedMultiobjective
    • Lets explain Games: we have rules, a system evolves according to theseDecision Nearly nobody trusts an making + policy rules:optimization industrial experiment, - Games: (in particular if effectsgame of Go, Chess, are supposedDynamicto be a draughts (roughlyrisk but reduction of useless,games for horizon 50 years...). convincing and easy to experiment)Partially But many people trust an Observable experiment on games.DistributedMultiobjective
    • Lets explain Games: we have rules, a system First wins evolves according to these against rules: professional - Games: players Chess, game of Go, draughts (roughly the game for useless, but convincing and easyGo of to experiment) ==> opened various doors for us (we are very grateful to strong pros like Kim Myung-Wang!)
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially ObservableDistributedMultiobjective
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially Observable RenewableDistributed energyMultiobjective
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic Nuclear convincing and easy to experiment) powergames plant - Industrial stuff: group of power plantsPartially ObservableDistributedMultiobjective
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: Coal group of power plantsPartially ObservableDistributedMultiobjective
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: Hydroelectric - power plant Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially ObservableDistributedMultiobjective
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: Hydroelectric - power plant Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games Involves - Industrial state variablesstuff: (stock levels) group of power plantsPartially ObservableDistributedMultiobjective
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially Observable + electricity consumersDistributed Depends on weather, economy, ...Multiobjective
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially Observable + electricity consumers + electric networkDistributed of lines Capacity Demand = ProductionMultiobjective >= demand!) (certainly not just production
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization- Games: So we have state variables, uncertainties, time steps, Chess, gameeffects... long term of Go, draughts (roughlya useless, but ==> this is termed dynamic gameDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially Observable + electricity consumers + electric networkDistributed of lines Capacity Demand = ProductionMultiobjective >= demand!) (certainly not just production
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games Can be modelized - Industrial stuff: probability distribution by a ==> not adversarial uncertainty group of power plantsPartially Observable + electricity consumers + electric network + weatherDistributedMultiobjective
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games Modelized by a - Industrial stuff: probability group of power plants? distributionPartially Observable + electricity consumers + electric network + weather + economyDistributedMultiobjective
    • Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts In particular,useless, but (roughly smart grids!Dynamic convincing and easy to experiment)games Worst case maybe better than - Industrial stuff: probabilistic models; adversarial uncertainty. group of power plantsPartially Observable + electricity consumers + electric network + weather + economyDistributed + technical inovationsMultiobjective
    • Climate change, peak oil,pollution, nuclear wastes... Important problems. We want to work numerically on this.
    • Lets be simple, 1We want electricity.We prefer no nuclear waste.We prefer no CO2.So why dont wejust build plentyof wind farms ?
    • Lets be simple, 1So why dont we just build plenty ofwind farms ?Because we need production = demandAlways. And we can not give ordersto winds.
    • Lets be simple, 2“Because we need production = demand”Why not production >= demand ?Because otherwise, we destroyboth production tools and electricappliances.
    • Lets be simple, 2In case production > demand,and artificial demand foruseless motors / heaters / … ?Maybe... wasting energy forproducing winds :-)But its better to do storageE.g. because sometimes theres nowind, no sun.
    • Lets be simple 3: so we solveeverything with storage ?Hydroelectricity:- Pumping water from bottom to top.- Compressed air==> but limitedFuture: electric vehicles
    • Other solutions than storage ?Devices which can be more or lessswitched on/off on demand (e.g.electric vehicules, air conditioning,fridges, heaters...)==> smart gridsAlso: long distance connections(sharing resources, smoothingproduction and demand).
    • How is the future ?Maybe much more electricity demand (electric vehicles ?)Hopefully less coal (CO2 pollution)Shale gas, methane clathrate ? Be careful :-)Wind farms ++Concentration solar plantsPhotovoltaic units ?Long distance connectionsNuclear or not ?
    • Lets explainDecision making + policyoptimization “Games”: we have rules, a systemDynamic evolves according to these rules.games Uncertainties:Partially - randomness Observableadversarial (worst case) -DistributedMultiobjective
    • Lets explain Weather = maybe theoreticallyDecision making + apolicy system, stochasticoptimization but not all variables are observed.Dynamic From restricted variables,games weather is partially observablePartially ObservableDistributedMultiobjective
    • Outline● Complexity and ATM● Complexity and games (incl. planning)● Bounded horizon games
    • Classical complexity classes,including non-determinism P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE Proved: PSPACE ≠ EXPSPACE P ≠ EXPTIME NP ≠ NEXPTIME Believed, not proved: P≠NP EXPTIME≠NEXPTIME NEXPTIME≠EXPSPACE
    • Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “for all” states (i.e. several transitions, accepts if all transitions accept)● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts)● ATM: TM with both “exists” and “for all” states.
    • Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts)● ATM: TM with both “exists” and “for all” states.
    • Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept)● ATM: TM with both “exists” and “for all” states.
    • Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept)● ATM: TM with both “exists” and “for all” states.
    • Alternation
    • Outline● Complexity and ATM● Complexity and games (incl. planning)● Bounded horizon games
    • Computational complexity: framework Discrete time, uncertainty. Uncertainty can be stochastic or adversarial. Succinct representation or flat representations. Which representation is more natural ? Probably succinct (one of the succinct...), but its not always so easy...
    • Complexity, partial observation, infinite horizon● 1P+random, unobservable: undecidable (Madani et al)● 1P+random, P(win=1), or equivalently 2P, P(win=1): [Rintanen and refs therein] – Fully observable: EXP [Littman94] – Unobservable: EXPSPACE [Hasslum et al 2000] – Partial observability: 2EXP Rmk: “2P, P(win=1)” is not “2P”!
    • Complexity, partial observation, infinite horizon● 2P vs 1P: undecidable! [Hearn, Demaine]● 2P (random or not): – Existence of sure win: equiv. to 1P+random ! ● EXP full-observable (e.g. Go, Robson 1984) ● PSPACE unobservable ● 2EXP partially observable – Existence of sure win, same state forbidden: EXPSPACE-complete (Go with Chinese rules ? rather conjectured EXPTIME or PSPACE...) – General case (optimal play): undecidable (Auger, Teytaud) (what about phantom-Go ?)
    • Complexity, partial observation Remarks:● Continuous case ?● Purely epistemic (we gather information, we dont change the state) ? [Sabbadin et al]● Restrictions on the policy, on the set of actions...● Discounted reward● DEC-POMDP, POSG : many players, same/opposite/different reward functions...
    • Lets explain Distributed: If you work on a problem withDecision making + policy billions euros, budget ~ 500optimization a cluster is not that expensive. Moreover, the problem isDynamic naturally multi-level:games - High level = investments - Low level = managementPartially Observable ~ 3 years, 2 weeks, (horizon 1 day, 1 minute)DistributedMultiobjective
    • Distributed nature of theproblem High level: optimization of the investments (horizon = 50 years) Lower level: simulation of the system, given investment strategies (lower level = parallelized) (real case a bit more complicated than that)
    • Lets explainDecision makingOne policy for each + objectiveoptimization of several scenariosDynamic (climate change,games fossile fuels, technologies...)Partially ObservableDistributedMultiobjective
    • Lets explainDecision making + policyoptimization One objective for eachDynamic of several risk levelsgames (median, 5% worst, 1% worst, ...)Partially ObservableDistributedMultiobjective
    • Research philosophy Too much industrial for Inria / Paris-Sud ? In my humble opinion, no. Industrial research is good if:- it is widely applicable (it is!)- or it is visible and easy to operate (it is not... “games” are!)- or It is very important (would you like it if there was nobody from academy working numerically on this ? ==> we are **the** neutral people...)
    • What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution
    • What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution ==> remove non-anytime tools
    • What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution ==> remove unstable tools
    • What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution ==> remove unstable tools
    • What do we use ? MCTS = - start with a MC (random simulator) - online optimize the simulations depending on statistics (updates the near future) DPS = optimize a random simulator so that decisions become better (far future effects correctly handled) Currently, we use MCTS with DPS as a MC tool.
    • Conclusions Nice big problems in energy. Require collaborations (many models, datas).● Our role is not to conclude “(dont) use shale gas” or “(dont) use methane clathrate”● Better: “if you use quantify XXX of clathrate and YYY of shale gas in conditions ZZZ then the distribution of economical and ecological costs switches to ...”
    • Conclusions Nice big problems in energy. Require collaborations. By the way, if you want to collaborate, people working numerically on this kind of stuff are more than welcome :-) Anytime algorithms are necessary, mixing between MCTS / DPS. There are still natural questions which are undecidable ==> decidability matters. Madani et al (1 player against random, no observability), extended here to 2 players with no random
    • Open problems & targets Phantom-Go undecidable ? Complexity of Go with Chinese rules ? (conjectured: PSPACE or EXPTIME; proved PSPACE-hard + EXPSPACE) A stable high-scale anytime platform for our energy management problems ==> if you like experimenting join us :-)