This document discusses reinforcement learning. It begins with basic definitions and applications of reinforcement learning. It then discusses policy based reinforcement learning methods, including value-based methods which estimate the value function and have an implicit policy, and policy gradient methods which directly estimate the policy. Specific algorithms discussed include Q-learning, Sarsa, and policy gradients. Examples applications provided include AlphaGo, robotics, healthcare, online trading, and scheduling.
32. Value-basedmethods
‣ Use policy and expected return to take action
‣ Estimate the value function
‣ Policy is implicit (eg 𝜀-greedy)
‣ i.e. Sarsa, Q-learning
33. Value-basedmethods
‣ Use policy and expected return to take action
‣ Estimate the value function
‣ Policy is implicit (eg 𝜀-greedy)
‣ i.e. Sarsa, Q-learning
34. Value-basedmethods
‣ Use policy and expected return to take action
‣ Estimate the value function
‣ Policy is implicit (eg 𝜀-greedy)
‣ i.e. Sarsa, Q-learning
38. Policy bASEDMETHODS
E=[max
(
∑ 𝑅 𝑠𝑡 |.
/01 𝝅 𝜃]
Policy
If we change an action we have a big impact
Changing the action distribution will have a smaller impact
39. Policy-Based methods
‣ Estimate the policy
‣ No value function
‣ For simpler problems
‣ Innate exploration by
his stochastic nature
‣ Can be used together
with supervised
learning
40. Policy gradient
‣ Recent success in video game, 3d locomotion, and Go
‣ Problems: sensitive to step size
‣ Slow progress
‣ Noise can mask the signal
42. Takeaway
‣ RL is useful
‣ Policy gradients had a lot of success
‣ OpenAI’s gym is a great tool to test RL algorithms
43. Training
Modern Machine Learning and Deep Learning
2 days course covering real life Deep Learning examples
https://www.eventbrite.co.uk/e/modern-machine-learning-and-deep-learning-
2-day-course-tickets-49603205523?aff=ebdssbdestsearch
Use discount code: IDEAIFORME
44. Thank you!
You can contact me at
www.ideai.io info@ideai.io
Newsletter:
subscribe@ideai.io