Labex2012g

202 views
130 views

Published on

Published in: Entertainment & Humor
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
202
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Labex2012g

  1. 1. DigiWorldDistributed decision making: partiallyobservable dynamic games andmultiobjective policy optimization Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.Paris-Sud, LRITAO, Inria-Saclay IDF, Cnrs 8623, In a nutshell:Lri, Univ. Paris-Sud,Digiteo Labs, PascalNetwork of Excellence. We optimize strategies, with parallel machines,DigiWorld and we test on games,September 2012. and we apply to energy.
  2. 2. Intro: so many words...DistributedDecision makingPartially observableDynamicGamesMultiobjectivePolicyOptimization
  3. 3. Lets explainDecision making + policyoptimization Decision making:Dynamic its all about making decisions.Games Humans in the loop, or not.Partially ObservableDistributedMultiobjective
  4. 4. Lets explainDecision making + policyoptimization Policy:Dynamic we provide policies.Games Its not graphical interfaces or data visualization, itsPartially Observable providing strategies.DistributedMultiobjective
  5. 5. Lets explainDecision making + policyoptimization Optimization:Dynamic its numerical.games We have objective functions,Partially Observable optimize. Its science, and we not astrology.DistributedMultiobjective
  6. 6. Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
  7. 7. Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
  8. 8. Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
  9. 9. Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
  10. 10. Lets explain Games: we have rules, a system evolves according to theseDecision making + policy rules:optimization - Games: Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
  11. 11. Lets explain Games: we have rules, a system evolves according to theseDecision making + policyoptimization Yes, MineSweeper rules: is - Games: really important. Chess, game of Go,Dynamic draughts (roughly useless, butgames convincing and easy to experiment)Partially ObservableDistributedMultiobjective
  12. 12. Lets explain Games: we have rules, a system evolves according to theseDecision Nearly nobody trusts an making + policy rules:optimization industrial experiment, - Games: (in particular if effectsgame of Go, Chess, are supposedDynamicto be a draughts (roughlyrisk but reduction of useless,games for horizon 50 years...). convincing and easy to experiment)Partially ObservableDistributedMultiobjective
  13. 13. Lets explain Games: we have rules, a system evolves according to theseDecision Nearly nobody trusts an making + policy rules:optimization industrial experiment, - Games: (in particular if effectsgame of Go, Chess, are supposedDynamicto be a draughts (roughlyrisk but reduction of useless,games for horizon 50 years...). convincing and easy to experiment)Partially But many people trust an Observable experiment on games.DistributedMultiobjective
  14. 14. Lets explain Games: we have rules, a system First wins evolves according to these against rules: professional - Games: players Chess, game of Go, draughts (roughly the game for useless, but convincing and easyGo of to experiment) ==> opened various doors for us (we are very grateful to strong pros like Kim Myung-Wang!)
  15. 15. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially ObservableDistributedMultiobjective
  16. 16. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially Observable RenewableDistributed energyMultiobjective
  17. 17. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic Nuclear convincing and easy to experiment) powergames plant - Industrial stuff: group of power plantsPartially ObservableDistributedMultiobjective
  18. 18. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: Coal group of power plantsPartially ObservableDistributedMultiobjective
  19. 19. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: Hydroelectric - power plant Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially ObservableDistributedMultiobjective
  20. 20. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: Hydroelectric - power plant Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games Involves - Industrial state variablesstuff: (stock levels) group of power plantsPartially ObservableDistributedMultiobjective
  21. 21. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially Observable + electricity consumersDistributed Depends on weather, economy, ...Multiobjective
  22. 22. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially Observable + electricity consumers + electric networkDistributed of lines Capacity Demand = ProductionMultiobjective >= demand!) (certainly not just production
  23. 23. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization- Games: So we have state variables, uncertainties, time steps, Chess, gameeffects... long term of Go, draughts (roughlya useless, but ==> this is termed dynamic gameDynamic convincing and easy to experiment)games - Industrial stuff: group of power plantsPartially Observable + electricity consumers + electric networkDistributed of lines Capacity Demand = ProductionMultiobjective >= demand!) (certainly not just production
  24. 24. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games Can be modelized - Industrial stuff: probability distribution by a ==> not adversarial uncertainty group of power plantsPartially Observable + electricity consumers + electric network + weatherDistributedMultiobjective
  25. 25. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts (roughly useless, butDynamic convincing and easy to experiment)games Modelized by a - Industrial stuff: probability group of power plants? distributionPartially Observable + electricity consumers + electric network + weather + economyDistributedMultiobjective
  26. 26. Games: we have rules, a systemLets explain evolves according to these rules:Decision making + policyoptimization Games: - Chess, game of Go, draughts In particular,useless, but (roughly smart grids!Dynamic convincing and easy to experiment)games Worst case maybe better than - Industrial stuff: probabilistic models; adversarial uncertainty. group of power plantsPartially Observable + electricity consumers + electric network + weather + economyDistributed + technical inovationsMultiobjective
  27. 27. Climate change, peak oil,pollution, nuclear wastes... Important problems. We want to work numerically on this.
  28. 28. Lets be simple, 1We want electricity.We prefer no nuclear waste.We prefer no CO2.So why dont wejust build plentyof wind farms ?
  29. 29. Lets be simple, 1So why dont we just build plenty ofwind farms ?Because we need production = demandAlways. And we can not give ordersto winds.
  30. 30. Lets be simple, 2“Because we need production = demand”Why not production >= demand ?Because otherwise, we destroyboth production tools and electricappliances.
  31. 31. Lets be simple, 2In case production > demand,and artificial demand foruseless motors / heaters / … ?Maybe... wasting energy forproducing winds :-)But its better to do storageE.g. because sometimes theres nowind, no sun.
  32. 32. Lets be simple 3: so we solveeverything with storage ?Hydroelectricity:- Pumping water from bottom to top.- Compressed air==> but limitedFuture: electric vehicles
  33. 33. Other solutions than storage ?Devices which can be more or lessswitched on/off on demand (e.g.electric vehicules, air conditioning,fridges, heaters...)==> smart gridsAlso: long distance connections(sharing resources, smoothingproduction and demand).
  34. 34. How is the future ?Maybe much more electricity demand (electric vehicles ?)Hopefully less coal (CO2 pollution)Shale gas, methane clathrate ? Be careful :-)Wind farms ++Concentration solar plantsPhotovoltaic units ?Long distance connectionsNuclear or not ?
  35. 35. Lets explainDecision making + policyoptimization “Games”: we have rules, a systemDynamic evolves according to these rules.games Uncertainties:Partially - randomness Observableadversarial (worst case) -DistributedMultiobjective
  36. 36. Lets explain Weather = maybe theoreticallyDecision making + apolicy system, stochasticoptimization but not all variables are observed.Dynamic From restricted variables,games weather is partially observablePartially ObservableDistributedMultiobjective
  37. 37. Outline● Complexity and ATM● Complexity and games (incl. planning)● Bounded horizon games
  38. 38. Classical complexity classes,including non-determinism P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE Proved: PSPACE ≠ EXPSPACE P ≠ EXPTIME NP ≠ NEXPTIME Believed, not proved: P≠NP EXPTIME≠NEXPTIME NEXPTIME≠EXPSPACE
  39. 39. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “for all” states (i.e. several transitions, accepts if all transitions accept)● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts)● ATM: TM with both “exists” and “for all” states.
  40. 40. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts)● ATM: TM with both “exists” and “for all” states.
  41. 41. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept)● ATM: TM with both “exists” and “for all” states.
  42. 42. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept)● ATM: TM with both “exists” and “for all” states.
  43. 43. Alternation
  44. 44. Outline● Complexity and ATM● Complexity and games (incl. planning)● Bounded horizon games
  45. 45. Computational complexity: framework Discrete time, uncertainty. Uncertainty can be stochastic or adversarial. Succinct representation or flat representations. Which representation is more natural ? Probably succinct (one of the succinct...), but its not always so easy...
  46. 46. Complexity, partial observation, infinite horizon● 1P+random, unobservable: undecidable (Madani et al)● 1P+random, P(win=1), or equivalently 2P, P(win=1): [Rintanen and refs therein] – Fully observable: EXP [Littman94] – Unobservable: EXPSPACE [Hasslum et al 2000] – Partial observability: 2EXP Rmk: “2P, P(win=1)” is not “2P”!
  47. 47. Complexity, partial observation, infinite horizon● 2P vs 1P: undecidable! [Hearn, Demaine]● 2P (random or not): – Existence of sure win: equiv. to 1P+random ! ● EXP full-observable (e.g. Go, Robson 1984) ● PSPACE unobservable ● 2EXP partially observable – Existence of sure win, same state forbidden: EXPSPACE-complete (Go with Chinese rules ? rather conjectured EXPTIME or PSPACE...) – General case (optimal play): undecidable (Auger, Teytaud) (what about phantom-Go ?)
  48. 48. Complexity, partial observation Remarks:● Continuous case ?● Purely epistemic (we gather information, we dont change the state) ? [Sabbadin et al]● Restrictions on the policy, on the set of actions...● Discounted reward● DEC-POMDP, POSG : many players, same/opposite/different reward functions...
  49. 49. Lets explain Distributed: If you work on a problem withDecision making + policy billions euros, budget ~ 500optimization a cluster is not that expensive. Moreover, the problem isDynamic naturally multi-level:games - High level = investments - Low level = managementPartially Observable ~ 3 years, 2 weeks, (horizon 1 day, 1 minute)DistributedMultiobjective
  50. 50. Distributed nature of theproblem High level: optimization of the investments (horizon = 50 years) Lower level: simulation of the system, given investment strategies (lower level = parallelized) (real case a bit more complicated than that)
  51. 51. Lets explainDecision makingOne policy for each + objectiveoptimization of several scenariosDynamic (climate change,games fossile fuels, technologies...)Partially ObservableDistributedMultiobjective
  52. 52. Lets explainDecision making + policyoptimization One objective for eachDynamic of several risk levelsgames (median, 5% worst, 1% worst, ...)Partially ObservableDistributedMultiobjective
  53. 53. Research philosophy Too much industrial for Inria / Paris-Sud ? In my humble opinion, no. Industrial research is good if:- it is widely applicable (it is!)- or it is visible and easy to operate (it is not... “games” are!)- or It is very important (would you like it if there was nobody from academy working numerically on this ? ==> we are **the** neutral people...)
  54. 54. What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution
  55. 55. What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution ==> remove non-anytime tools
  56. 56. What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution ==> remove unstable tools
  57. 57. What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution ==> remove unstable tools
  58. 58. What do we use ? MCTS = - start with a MC (random simulator) - online optimize the simulations depending on statistics (updates the near future) DPS = optimize a random simulator so that decisions become better (far future effects correctly handled) Currently, we use MCTS with DPS as a MC tool.
  59. 59. Conclusions Nice big problems in energy. Require collaborations (many models, datas).● Our role is not to conclude “(dont) use shale gas” or “(dont) use methane clathrate”● Better: “if you use quantify XXX of clathrate and YYY of shale gas in conditions ZZZ then the distribution of economical and ecological costs switches to ...”
  60. 60. Conclusions Nice big problems in energy. Require collaborations. By the way, if you want to collaborate, people working numerically on this kind of stuff are more than welcome :-) Anytime algorithms are necessary, mixing between MCTS / DPS. There are still natural questions which are undecidable ==> decidability matters. Madani et al (1 player against random, no observability), extended here to 2 players with no random
  61. 61. Open problems & targets Phantom-Go undecidable ? Complexity of Go with Chinese rules ? (conjectured: PSPACE or EXPTIME; proved PSPACE-hard + EXPSPACE) A stable high-scale anytime platform for our energy management problems ==> if you like experimenting join us :-)

×