Evolution strategies in reinforcement learning
Borys Tymchenko
Rise of the Machines
Odessa, 2019
Agenda
– Brief intro to Reinforcement learning
●
Value and Policy
●
Exploration and exploitation
– Evolution Strategies
●
Genetic algorithms
●
Natural evolution
●
Safe mutations
Recap: Reinforcement learning
●
Markov decision process
– States S
– Actions A
– Transitions P (s’ |s, a)
– Rewards R (s, a, s’)
●
Quantities
– Policy – map from states to actions
– Utility – sum of discounted rewards
– Fitness (for ES) – sum of non-discounted rewards
– Quality – expected utility from state/action pair
Credit assignment problem
Temporal
Exploitation vs exploration
Why not both?
Exploration methods in RL
●
Simple
– Inject noise in actions (e-greedy policy)
– Sample action from actions distribution
– Sample actions from Bayesian neural networks
●
Not that simple
– Curiosity
– Search of hard regions
– Novelty search
– etc...
Problem: unusable for long-term planning
Problem: usually hard and domain-dependent
Problems example
Car race
Continuous action space
Balance tasks
Discrete action space
Common problems:
- Imagine reward at the end of the race
- Small error over time leads to an accident
- Big one-time error leads to an accident
Montezuma’s revenge
Some special sorcery required
Evolution strategies
Why use ES?
●
Eliminate (temporal) credit assignment problem
by definition
●
Exploration by definition
●
Deal with noisy rewards just by more samples
Credit assignment problem
●
Highly non-linear
●
Possibly millions of
parameters
●
How to assign values
to weights?
Convolution kernels flattened in the same manner
Structural
Genetic Algorithms
Overall idea:
●
Create new population
●
Select the fittest individuals (Selection)
●
Mix them (Crossover)
●
Change them (Mutation)
●
Create a population...
Genetic Algorithms
Overall idea: spy for nature
●
Selection: select k fittest agents from population
to pool
●
Crossover: somehow mix the parameters of
selected agents
●
Mutation: introduce noise to some agents
Genetic Algorithms: Concerns
●
Crossing over parts of different NN is
unpredictable
●
Throwing out solutions is bad
●
Mutation can disrupt sensitive parameters
Safe Layer-Blend Crossover
Overall idea: Layer-wise linear blending with random
parameter for each layer
Natural Evolution Strategies
Overall idea: throwing out bad agents is bad
●
Sample many random perturbations to policy parameters
●
Evaluate each offspring in environment
●
Take the weighted average of perturbations
●
Take step in the direction of average perturbation
●
Rinse and repeat
Not that natural
Natural Evolution: Explanation
We can skip it
Sampling example
Natural Evolution: Summary
●
No need of backpropagation!
●
Highly and easily parallelizable
●
Deals with non-differentiable problems
●
Sample inefficient
●
Depends on policy initial parametrization
●
Mutations break subtle nature of DNN
Safe Mutations
Uber: The Safe Mutations with
Rescaling
Overall idea: make perturbation small in behavior space
●
Introduce disturbance to their parameters
●
- Gaussian noise vector of parameters vector size
●
- scalar, that changes noise scale:
Find with linear search to keep divergence low
Uber: The Safe Mutations with
Gradients
Overall idea: make informed choice of perturbation
●
Introduce disturbance to their parameters
●
- Gaussian noise vector of genome size
●
- sensitivity vector of genome size
Safe Mutations: Ways to compute
sensitivity
Use exact gradient
information
Use approximate
gradient
information
Evolutional strategies: Limitations
Safe Mutations: Summary
●
More sample efficient than plain mutations
●
Highly parallelizable, too
●
Robust to parameters perturbation
●
Mutation efficacy depends on domain
Things to make it actually work
●
Shape fitness to be more uniform
●
Augment sparse fitness with auxiliary goals
●
Combine with method to avoid deceptive local optima
●
Average rollouts helps generalizing
●
Augment tasks to help robustness
●
Huge amount of processors for domain evaluations
Questions
Borys Tymchenko
borys.tymchenko@apostera.com
https://www.linkedin.com/in/spsancti/
Useful Links
●
http://bit.ly/salimans
●
http://bit.ly/deep-neuroevolution
●
http://bit.ly/ddpg_keras
●
http://bit.ly/var_opt
●
http://bit.ly/openai_evo

«Evolution strategies in reinforcement learning», Borys Tymchenko.