«Evolution strategies in reinforcement learning», Borys Tymchenko.

Evolution strategies in reinforcement learning
Borys Tymchenko
Rise of the Machines
Odessa, 2019

Agenda
– Brief intro to Reinforcement learning
●
Value and Policy
●
Exploration and exploitation
– Evolution Strategies
●
Genetic algorithms
●
Natural evolution
●
Safe mutations

Recap: Reinforcement learning
●
Markov decision process
– States S
– Actions A
– Transitions P (s’ |s, a)
– Rewards R (s, a, s’)
●
Quantities
– Policy – map from states to actions
– Utility – sum of discounted rewards
– Fitness (for ES) – sum of non-discounted rewards
– Quality – expected utility from state/action pair

Credit assignment problem
Temporal

Exploitation vs exploration
Why not both?

Exploration methods in RL
●
Simple
– Inject noise in actions (e-greedy policy)
– Sample action from actions distribution
– Sample actions from Bayesian neural networks
●
Not that simple
– Curiosity
– Search of hard regions
– Novelty search
– etc...
Problem: unusable for long-term planning
Problem: usually hard and domain-dependent

Problems example
Car race
Continuous action space
Balance tasks
Discrete action space
Common problems:
- Imagine reward at the end of the race
- Small error over time leads to an accident
- Big one-time error leads to an accident
Montezuma’s revenge
Some special sorcery required

Why use ES?
●
Eliminate (temporal) credit assignment problem
by definition
●
Exploration by definition
●
Deal with noisy rewards just by more samples

Credit assignment problem
●
Highly non-linear
●
Possibly millions of
parameters
●
How to assign values
to weights?
Convolution kernels flattened in the same manner
Structural

Genetic Algorithms
Overall idea:
●
Create new population
●
Select the fittest individuals (Selection)
●
Mix them (Crossover)
●
Change them (Mutation)
●
Create a population...

Genetic Algorithms
Overall idea: spy for nature
●
Selection: select k fittest agents from population
to pool
●
Crossover: somehow mix the parameters of
selected agents
●
Mutation: introduce noise to some agents

Genetic Algorithms: Concerns
●
Crossing over parts of different NN is
unpredictable
●
Throwing out solutions is bad
●
Mutation can disrupt sensitive parameters

Safe Layer-Blend Crossover
Overall idea: Layer-wise linear blending with random
parameter for each layer

Natural Evolution Strategies
Overall idea: throwing out bad agents is bad
●
Sample many random perturbations to policy parameters
●
Evaluate each offspring in environment
●
Take the weighted average of perturbations
●
Take step in the direction of average perturbation
●
Rinse and repeat
Not that natural

Natural Evolution: Explanation
We can skip it

Natural Evolution: Summary
●
No need of backpropagation!
●
Highly and easily parallelizable
●
Deals with non-differentiable problems
●
Sample inefficient
●
Depends on policy initial parametrization
●
Mutations break subtle nature of DNN

Uber: The Safe Mutations with
Rescaling
Overall idea: make perturbation small in behavior space
●
Introduce disturbance to their parameters
●
- Gaussian noise vector of parameters vector size
●
- scalar, that changes noise scale:
Find with linear search to keep divergence low

Uber: The Safe Mutations with
Gradients
Overall idea: make informed choice of perturbation
●
Introduce disturbance to their parameters
●
- Gaussian noise vector of genome size
●
- sensitivity vector of genome size

Safe Mutations: Ways to compute
sensitivity
Use exact gradient
information
Use approximate
gradient
information

Evolutional strategies: Limitations

Safe Mutations: Summary
●
More sample efficient than plain mutations
●
Highly parallelizable, too
●
Robust to parameters perturbation
●
Mutation efficacy depends on domain

Things to make it actually work
●
Shape fitness to be more uniform
●
Augment sparse fitness with auxiliary goals
●
Combine with method to avoid deceptive local optima
●
Average rollouts helps generalizing
●
Augment tasks to help robustness
●
Huge amount of processors for domain evaluations

Questions
Borys Tymchenko
borys.tymchenko@apostera.com
https://www.linkedin.com/in/spsancti/

Useful Links
●
http://bit.ly/salimans
●
http://bit.ly/deep-neuroevolution
●
http://bit.ly/ddpg_keras
●
http://bit.ly/var_opt
●
http://bit.ly/openai_evo

«Evolution strategies in reinforcement learning», Borys Tymchenko.

More Related Content

Similar to «Evolution strategies in reinforcement learning», Borys Tymchenko.

More from Provectus

Recently uploaded

«Evolution strategies in reinforcement learning», Borys Tymchenko.