The document describes a competition to develop reinforcement learning agents that can control a simulated power grid network. The goal is to move towards 100% renewable energy sources by 2050. Participants' agents are evaluated on their ability to manage the power grid over time by making decisions about line status, topology changes, generator output, and storage usage. Previous competitions saw improvements from using techniques like PPO and mixtures of experts. The current competition baseline agent uses PPO and expert rules, but top participants have exceeded it by better training strategies and agent designs.
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Learning to Run a Power Network - a design challenge - TAILOR Conference - Prague 2022
1. Learning To Run
a Power Network
Sebastien Treguer, Marc Schoenauer
RL for Energies of the future and carbon neutrality:
a Challenge Design
2. Energy shift to reach carbon neutrality by 2050
● Unfortunately, solar and wind power come with some
drawbacks.
○ Intermittent
○ Highly uncertain
● Electricity storage is limited and inefficient,
which promotes controllable generators, as opposed
to intermittent generators
3. Complexity of power network operations
Interventions from human experts:
● Dispatching to avoid power overflows
● Adapting the production injected into
the power network
● Modifying the amount of power
absorbed by storage units
● Limiting the amount of energy
injected by renewable generators
(such as wind or solar) in case of
overproduction
How to keep the power network stable at any time at any place?
4. Competition design
● The physical simulation environment
is based on Python’s module Grid2Op
● Chronics =
time series describing the electricity
injections in the power network
5. ● Competition with code submission
● A starting kit with a set of tools and tutorials including :
○ Grid2Viz, power network visualization and diagnosis tools
○ A baseline code
○ A sample submission with the baseline
● A 3 phases competition protocol : Warmup, Development, Final test and Legacy
● Timeline :
○ Warmup phase from june 15th to july 4th
○ Development phase from july 5th to September 14th
○ Final phase from September 15th, with results revealed on September 30th
Competition design
6. ● Observation space: Complete state of the power network, power nodes (electricity
produced and consumed), flows of each power lines, and more.
● Action space. Four types of actions:
1. Line status (line connection/disconnection).
2. Topology changes (node splitting).
3. Power production changes/curtailment (of generators).
4. Storage changes (storage or delivery from batteries).
➔ 70,000 discrete actions (topology changes)
➔ 69-dimensional continuous action space (production changes).
Competition design
7. Evaluation metric
● Cost of energy losses: Calculated by multiplying the amount of electricity lost due to the
Joule effect by the current price of the MWh.
● Cost of operation: Sum of the costs of the agent’s actions. Operations involving changes
in the production of electricity have a cost that depends on the energy market. The use of
batteries has a fixed cost per MWh.
● Cost of blackout: If the agent did not manage the power network until the end of the
scenario, this cost is calculated by multiplying the amount of electricity left to supply by
the current price of MWh.
8. Previous Challenges
● The participants’ algorithms must
control a simulated power network, in
a reinforcement learning framework.
● The simulated power network getting
more and more complex over editions
A series of ML challenges since 2019
2019 Double Deep Q-Learning algorithm
2020 - 1 Action space reduction,
initializes a policy parameterized by a
feed-forward neural network, Evolutionary
black-box optimization
2020 - 2 Policy neural network to select
the Top-K actions, and optimization
algorithm to choose the best one
Winning solutions of previous editions
9. Why a new challenge?
● Updated simulator and data to represent
zero carbon scenarios. Chronics with less
than 3% of production from fossil fuels
● 32 years of scenarios available for training
● By 2030: Reduce by half the
dependence on fossil fuels and
nuclear power
● By 2050: Moving towards a 100%
renewable energy and zero carbon
New goals for the Ile de France region
10. Baseline
● 1st check if ¨obvious actions¨ from expert rules
(Action of type 1. line reconnections) could already
improve the network state?
● Excludes actions of type 2 (node splitting)
● The RL part of the agent focuses on continuous
actions: 3. curtailment and 4. storage units
● Based on the Proximal Policy Optimization (PPO)
algorithm
bootstrap the competition with a reasonably good
agent, but leave room for improvement
11. Experiments
● 14 agents trained for 10 millions iterations
● Smoothing the impact of RL actions by tuning 2 hyperparameters
○ Safe_max_rho
○ limit_cs_margin
● Baseline doing significantly better than Do Nothing agent or Expert rules only
● Training on “cherry picking” scenarios seems to do better than training on all scenarios
available from the public dataset of the competition
● One way to alleviate this is to limit training on the most difficult week of the year
● More systematic experiments to optimize the training curriculum are under way
● Ways of improvements with a mixture of experts
12. Results
● 15 teams, 261 submissions as of today (09/12/2022)
● Best results from participants are significantly better than our baseline
1. Maze-RL 66.55
2. Richard_wth 52.96
3. Rhapsody 47.03
● LRPN Baseline 22.46
● Expert rules only ~5
● Do nothing 0
13. ● This competition design stimulated contributions of people from various
backgrounds without expertise in Power Grid control
● A competition accelerate the progress but be careful of its design to
stimulate progress in the right direction
● For this specific task of Power grid control, combining trained RL agent with
heuristic rules is a reasonably good baseline, doing better than expert rules
only or RL only
● Teams of RL experts can beat the SOA without prior knowledge in Power
Grid control
● A mixture of expert agents seems a promising direction of improvement
Conclusion
14. ● Gaetan Serré ¹
● Eva Boguslawski ¹ ³
● Benjamin Donnot ³
● Antoine Marot ³
Acknowledgments
1. LISN/CNRS/INRIA, Université Paris Saclay
2. Chalearn
3. RTE France
● Adrien Pavao ¹
● Isabelle Guyon ¹ ²
● Marc Schoenauer ¹
Links
Competition page
https://codalab.lisn.upsaclay.fr/competitions/5410
Paper
https://hal.archives-ouvertes.fr/hal-03726294v2/document