RL Applications, Policy Methods & AlphaGo

•

1 like•972 views

This document discusses reinforcement learning. It begins with basic definitions and applications of reinforcement learning. It then discusses policy based reinforcement learning methods, including value-based methods which estimate the value function and have an implicit policy, and policy gradient methods which directly estimate the policy. Specific algorithms discussed include Q-learning, Sarsa, and policy gradients. Examples applications provided include AlphaGo, robotics, healthcare, online trading, and scheduling.

Technology

Reinforcement learning
#AIEnsamble
London Artificial Intelligence Meetup

Reinforcement learning
‣ Basic definitions
‣ Applications
‣ Policy based methods
Theory Demo
‣ OpenAI

MACHINE LEARNING
SUPERVISED REINFORCEMENTUNSUPERVISED

Reinforcement learning
agent
environment
feedback
action

Alpha go The image part with relationship ID rId2 was not found in the ﬁle.

Why it matters
‣ Text summarisation engines
‣ Dialog agents (text, speech)
‣ Learning optimal treatment policies in healthcare
‣ Online stock trading
‣ Scheduling
‣ …

Why it matters
‣ To learn how to make decisions to achieve a specific goal

https://www.youtube.com/watch?v=oo0TraGu6QY
https://www.youtube.com/watch?time_continue=72&v=TmPfTpjtdgg
https://www.youtube.com/watch?v=UZHTNBMAfAA
https://www.youtube.com/watch?time_continue=118&v=eHipy_j29Xw
Interesting Applications

The Problem
agent
environment
feedback
action

‣Markov Decision Process
Agent Environment
at
‣Rt Rt+1 st+1
‣St

Environment
‣ Multiple states
‣ Complex reward function

Feedback
‣ Returned by the environment
+10
+1

Value-basedmethods
‣ Use policy and expected return to take action
‣ Estimate the value function
‣ Policy is implicit (eg 𝜀-greedy)
‣ i.e. Sarsa, Q-learning

Value-basedmethods
Q-function
policy
environment
feedback
action

Policy gradients
𝝅𝜽(s|a)
environment
feedback
action
𝝅 𝜃(a|s) = probability of action a in state s

Policy gradients
𝝅𝜽(s|a)
environment
feedback
action
Actions

Policy bASEDMETHODS
E=[max
(
∑ 𝑅 𝑠𝑡 |.
/01 𝝅 𝜃]
Policy
If we change an action we have a big impact
Changing the action distribution will have a smaller impact

Policy-Based methods
‣ Estimate the policy
‣ No value function
‣ For simpler problems
‣ Innate exploration by
his stochastic nature
‣ Can be used together
with supervised
learning

Policy gradient
‣ Recent success in video game, 3d locomotion, and Go
‣ Problems: sensitive to step size
‣ Slow progress
‣ Noise can mask the signal

Takeaway
‣ RL is useful
‣ Policy gradients had a lot of success
‣ OpenAI’s gym is a great tool to test RL algorithms

Training
Modern Machine Learning and Deep Learning
2 days course covering real life Deep Learning examples
https://www.eventbrite.co.uk/e/modern-machine-learning-and-deep-learning-
2-day-course-tickets-49603205523?aff=ebdssbdestsearch
Use discount code: IDEAIFORME

Thank you!
You can contact me at
www.ideai.io info@ideai.io
Newsletter:
subscribe@ideai.io

Similar to RL Applications, Policy Methods & AlphaGo

Six sigmaHaseeb Qaiser

Why sixsigmaNareshChawla

1505 Statistical Thinking course extractJefferson Lynch

A Practical Guide To Mixed Methodologies For UX ResearchUXDXConf

ATAGTR2017 Machine Learning telepathy for Shift Right approach of testingAgile Testing Alliance

Learning shot 'how to measure performance in agile projects' Fernando Ostanelli

6sigma training.pdfVan-Hieu NGO

How to Plan for an xAPI Pilot at xAPI Camp DevLearn 2018 - Yet AnalyticsAllie Tscheulin

How to Plan for Your xAPI Pilot - xAPI Camp at DevLearn 2018 - Yet Analytics Margaret Roth

The Future of TestingPerfecto by Perforce

8D Problem Solving ApproachTimothy Wooi

Digicrome Data Science & AI 11 Month Course PDF.pdfitsmeankitkhan

Embedded analytics and digital transformationGuha Athreya

Introduction to Reinforcement Learning.pdfAbhinavNautiyal8

201505 Statistical Thinking course extractJefferson Lynch

The Machine Learning AuditAndrew Clark

Six Sigma certification summary_ AqontaAqonta

Agile PM - but not all projectsJohnny Ryser

Blueprint for GSA Migration with CoveoMC+A

Improve regression test effectiveness with defect detection percentage (ddp)Tasktop

Similar to RL Applications, Policy Methods & AlphaGo (20)

Six sigma

Why sixsigma

1505 Statistical Thinking course extract

A Practical Guide To Mixed Methodologies For UX Research

ATAGTR2017 Machine Learning telepathy for Shift Right approach of testing

Learning shot 'how to measure performance in agile projects'

6sigma training.pdf

How to Plan for an xAPI Pilot at xAPI Camp DevLearn 2018 - Yet Analytics

How to Plan for Your xAPI Pilot - xAPI Camp at DevLearn 2018 - Yet Analytics

The Future of Testing

8D Problem Solving Approach

Digicrome Data Science & AI 11 Month Course PDF.pdf

Embedded analytics and digital transformation

Introduction to Reinforcement Learning.pdf

201505 Statistical Thinking course extract

The Machine Learning Audit

Six Sigma certification summary_ Aqonta

Agile PM - but not all projects

Blueprint for GSA Migration with Coveo

Improve regression test effectiveness with defect detection percentage (ddp)

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

How to Remove Document Management Hurdles with X-Docs?XfilesPro

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4

Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group

Pigging Solutions in Pet Food ManufacturingPigging Solutions

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Slack Application Development 101 Slidespraypatel2

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions

The Codex of Business Writing Software for Real-World Solutions 2.pptx

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Injustice - Developers Among Us (SciFiDevCon 2024)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Pigging Solutions Piggable Sweeping Elbows

How to Remove Document Management Hurdles with X-Docs?

08448380779 Call Girls In Friends Colony Women Seeking Men

Maximizing Board Effectiveness 2024 Webinar.pptx

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Human Factors of XR: Using Human Factors to Design XR Systems

Benefits Of Flutter Compared To Other Frameworks

Azure Monitor & Application Insight to monitor Infrastructure & Application

Next-generation AAM aircraft unveiled by Supernal, S-A2

Pigging Solutions in Pet Food Manufacturing

My Hashitalk Indonesia April 2024 Presentation

Slack Application Development 101 Slides

08448380779 Call Girls In Civil Lines Women Seeking Men

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Understanding the Laravel MVC Architecture

RL Applications, Policy Methods & AlphaGo

1. Reinforcement learning #AIEnsamble London Artificial Intelligence Meetup

2. Leonardo De Marchi

3. Reinforcement learning ‣ Basic definitions ‣ Applications ‣ Policy based methods Theory Demo ‣ OpenAI

4. MACHINE LEARNING SUPERVISED REINFORCEMENTUNSUPERVISED

5. SUPERVISED - TRAINING INPUT OUTPUT

6. SUPERVISED - TRAINING INPUT OUTPUTMODEL

7. SUPERVISED - scoring INPUT OUTPUTMODEL

8. SUPERVISED - scoring

9. unSUPERVISED INPUT input clusteredMODEL

10. unSUPERVISED INPUT input clusteredMODEL

11. Reinforcement learning

12. Reinforcement learning agent environment feedback action

13. applications

14. Alpha go The image part with relationship ID rId2 was not found in the ﬁle.

15. Alpha zero

16. Robotic

17. Why it matters ‣ Text summarisation engines ‣ Dialog agents (text, speech) ‣ Learning optimal treatment policies in healthcare ‣ Online stock trading ‣ Scheduling ‣ …

18. Why it matters ‣ To learn how to make decisions to achieve a specific goal

19. games

20. A2C

21. GQN

22. GQN

23. https://www.youtube.com/watch?v=oo0TraGu6QY https://www.youtube.com/watch?time_continue=72&v=TmPfTpjtdgg https://www.youtube.com/watch?v=UZHTNBMAfAA https://www.youtube.com/watch?time_continue=118&v=eHipy_j29Xw Interesting Applications

24. our problem

25. The Problem agent environment feedback action

26. ‣Markov Decision Process Agent Environment at ‣Rt Rt+1 st+1 ‣St

27. Environment

28. Environment ‣ Multiple states ‣ Complex reward function

29. Feedback ‣ Returned by the environment +10 +1

30. Goal ‣ Maximise the total reward

31. Reinforcement Learning Algorithms

32. Value-basedmethods ‣ Use policy and expected return to take action ‣ Estimate the value function ‣ Policy is implicit (eg 𝜀-greedy) ‣ i.e. Sarsa, Q-learning

33. Value-basedmethods ‣ Use policy and expected return to take action ‣ Estimate the value function ‣ Policy is implicit (eg 𝜀-greedy) ‣ i.e. Sarsa, Q-learning

34. Value-basedmethods ‣ Use policy and expected return to take action ‣ Estimate the value function ‣ Policy is implicit (eg 𝜀-greedy) ‣ i.e. Sarsa, Q-learning

35. Value-basedmethods Q-function policy environment feedback action

36. Policy gradients 𝝅𝜽(s|a) environment feedback action 𝝅 𝜃(a|s) = probability of action a in state s

37. Policy gradients 𝝅𝜽(s|a) environment feedback action Actions

38. Policy bASEDMETHODS E=[max ( ∑ 𝑅 𝑠𝑡 |. /01 𝝅 𝜃] Policy If we change an action we have a big impact Changing the action distribution will have a smaller impact

39. Policy-Based methods ‣ Estimate the policy ‣ No value function ‣ For simpler problems ‣ Innate exploration by his stochastic nature ‣ Can be used together with supervised learning

40. Policy gradient ‣ Recent success in video game, 3d locomotion, and Go ‣ Problems: sensitive to step size ‣ Slow progress ‣ Noise can mask the signal

41. Demo

42. Takeaway ‣ RL is useful ‣ Policy gradients had a lot of success ‣ OpenAI’s gym is a great tool to test RL algorithms

43. Training Modern Machine Learning and Deep Learning 2 days course covering real life Deep Learning examples https://www.eventbrite.co.uk/e/modern-machine-learning-and-deep-learning- 2-day-course-tickets-49603205523?aff=ebdssbdestsearch Use discount code: IDEAIFORME

44. Thank you! You can contact me at www.ideai.io info@ideai.io Newsletter: subscribe@ideai.io

RL Applications, Policy Methods & AlphaGo

Recommended

Recommended

More Related Content

Similar to RL Applications, Policy Methods & AlphaGo

Similar to RL Applications, Policy Methods & AlphaGo (20)

More from Seldon

More from Seldon (20)

Recently uploaded

Recently uploaded (20)

RL Applications, Policy Methods & AlphaGo