Introduction to Reinforcement Learning

•Download as PPTX, PDF•

1 like•858 views

Reinforcement Learning is learning what to do – what action to take in a specific situation – in order to maximize some type of reward. It’s one of the most promising areas of Machine Learning today. It plays an important part in some very high-profile success stories of AI, such as mastering Go, learning to play computer games, autonomous driving, autonomous stock trading, and more. In this talk we’ll introduce the main theoretical and practical aspects of Reinforcement Learning, discuss its very distinctive set of challenges, and explore what the future looks like for self-training machines.

Technology

Reinforcement learning what to do
maximize a numerical
reward signal
by
trying them
-- Reinforcement Learning: An Introduction
R. Sutton, A. Barto, MIT Press, 1998

MuJoCo - Multi-Joint dynamics with Contact

Microsoft AirSim
https://github.com/Microsoft/AirSim

Project Malmo
https://github.com/Microsoft/malmo

Qπ
pos1, speed1 , push left = 21
Qπ
pos1, speed1 , push right = 26
( pos1 speed1 )

• Given a new experience, (𝑠, 𝑎, 𝑟, 𝑠′
)
(Bellman Equation)

Qπ
𝑠′
, push left = 21
Qπ
𝑠′
, push right = 26
𝑠′
= ( pos1 speed1 )
𝑠
-1

• Given a new experience, (𝑠, 𝑝𝑢𝑠ℎ 𝑟𝑖𝑔ℎ𝑡, −1, 𝑠′
)
(Bellman Equation)

State Action Q (s,a)
S1 A1 0
S1 A2 -2
S2 A1 1
S2 A2 -1
S3 A1 3
… … …
S99 A1 100

• The agent should be able to deal with previously unseen states
• States might be only partially observable

position
speed
Q(s, push left)
Q(s, push right)
•
“target” “current”loss
(𝑠, 𝑎, 𝑟, 𝑠′
) =>

qmodel.init('random')
policy.init('greedy')
for N episodes do:
state = environment.init_episode()
do:
a = policy.select_action(state, qmodel)
(reward, next_state, done) = environment.step(state, action)
qmodel.learn((state, action, reward, next_state))
state = next_state
while (!done)
end for

do:
a = policy.select_action(state, qmodel)
(reward, next_state, done) =
environment.step(state, action)
qmodel.learn((state, action, reward, next_state))
state = next_state
while (!done)

http://cntk.ai/
https://github.com/Microsoft/CNTK/tree/master/Examples/ReinforcementLearning

Image source: https://bons.ai/blog/deep-reinforcement-learning-enterprise

•
•
•
•
• Cost of acquiring experience
• Cost of failing
• Robotic Control
• Autonomous Vehicles
• IT Network Security
• Financial Trading
• Fleet Logistics
• Process Planning…

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Advanced Computer Architecture – An IntroductionDilum Bandara

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

From Family Reminiscence to Scholarly Archive .Alan Dix

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

"ML in Production",Oleksandr BaganFwdays

unit 4 immunoblotting technique complete.pptxBkGupta21

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

SIP trunking in Janus @ Kamailio World 2024

Advanced Computer Architecture – An Introduction

Gen AI in Business - Global Trends Report 2024.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Are Multi-Cloud and Serverless Good or Bad?

From Family Reminiscence to Scholarly Archive .

WordPress Websites for Engineers: Elevate Your Brand

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

The State of Passkeys with FIDO Alliance.pptx

How AI, OpenAI, and ChatGPT impact business and software.

SAP Build Work Zone - Overview L2-L3.pptx

DMCC Future of Trade Web3 - Special Edition

"ML in Production",Oleksandr Bagan

unit 4 immunoblotting technique complete.pptx

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Introduction to Reinforcement Learning

1. Introduction to Technology Solutions Professional, Data & AI Microsoft source: xkcd.com

2. Reinforcement learning what to do maximize a numerical reward signal by trying them -- Reinforcement Learning: An Introduction R. Sutton, A. Barto, MIT Press, 1998

3. •

4. •

6. •

7. •

9. • •

10. OpenAI Gym https://gym.openai.com/

11. Classic Control Problems

12. Atari Game Environments

13. MuJoCo - Multi-Joint dynamics with Contact

14. Robotics

15. Microsoft AirSim https://github.com/Microsoft/AirSim

16.

17.

18.

19. Project Malmo https://github.com/Microsoft/malmo

20.

21.

22. • • • • • p s +100 -1 -1 -1 -1

23. • 𝛾 𝛾2 𝛾3 𝛾 < 1 𝛾

24. •

25. Qπ pos1, speed1 , push left = 21 Qπ pos1, speed1 , push right = 26 ( pos1 speed1 )

26.

27. • Given a new experience, (𝑠, 𝑎, 𝑟, 𝑠′ ) (Bellman Equation)

28. Qπ 𝑠′ , push left = 21 Qπ 𝑠′ , push right = 26 𝑠′ = ( pos1 speed1 ) 𝑠 -1

29. • Given a new experience, (𝑠, 𝑝𝑢𝑠ℎ 𝑟𝑖𝑔ℎ𝑡, −1, 𝑠′ ) (Bellman Equation)

30. State Action Q (s,a) S1 A1 0 S1 A2 -2 S2 A1 1 S2 A2 -1 S3 A1 3 … … … S99 A1 100

31.

32. • The agent should be able to deal with previously unseen states • States might be only partially observable

33. approximate

34. position speed Q(s, push left) Q(s, push right) • “target” “current”loss (𝑠, 𝑎, 𝑟, 𝑠′ ) =>

35. qmodel.init('random') policy.init('greedy') for N episodes do: state = environment.init_episode() do: a = policy.select_action(state, qmodel) (reward, next_state, done) = environment.step(state, action) qmodel.learn((state, action, reward, next_state)) state = next_state while (!done) end for

36. do: a = policy.select_action(state, qmodel) (reward, next_state, done) = environment.step(state, action) qmodel.learn((state, action, reward, next_state)) state = next_state while (!done)

37. http://cntk.ai/ https://github.com/Microsoft/CNTK/tree/master/Examples/ReinforcementLearning

38.

39.

40.

41. 𝜆

42. https://senseis.xmp.net/?Joseki

43.

44.

45.

46. Image source: https://bons.ai/blog/deep-reinforcement-learning-enterprise

47. • • • •

48. • • • • • Cost of acquiring experience • Cost of failing • Robotic Control • Autonomous Vehicles • IT Network Security • Financial Trading • Fleet Logistics • Process Planning…

Introduction to Reinforcement Learning

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Introduction to Reinforcement Learning