Reinforcement Learning

•

1 like•572 views

Yigit UNALLAR

Discovering learning by interaction on agent environment

Engineering

Machine Learning
Learn without explicitly programmed!
● Supervised Learning
● Unsupervised Learning
● Reinforcement Learning

Reinforcement Learning
● Learning from interaction!
○ Driving a car,
○ Holding a conversation,
● Goal-directed approach
○ Closed-loop,
○ Reward oriented,

Reinforcement vs. Unsupervised Learning
● Hidden structures!
● Unlabeled data!
● No reliance on structures!
● Maximize a reward!

Exploration vs. Exploitation Dilemma
● Exploit to obtain rewards!
● Explore to perform better!
● Either Exploration or Exploitation?
● Closest to the human and animal learning!

Examples
● Mobile Robot
○ More trash to find,
○ Way back to battery station,
● Adaptive Controller for Petrol Refinery
○ Optimize yield/cost/quality,
○ Specified marginal costs,

Agent & Environment
● Policy,
○ Mapping from states to actions,
● Reward,
○ Pain, pleasure,
● Value Function,
○ Farsighted judgement,
● Model,
○ Mimics the environment,

Pick and Place Robot
Action:
Voltages at motors,
States:
Latest joint data,
Reward:
+1 for successful pick-up, computed in the environment!

Goals & Markov Decision Process
Goals:
Markov Decision Process:
Retaining all relevant information, Markov Property!

Markov Decision Process ctd.
MDP if,
● The state and action spaces are finite,
● Satisfies Markov property,
Example: Recycling Robot
● Actively search for a can,
● Remain still and wait for a can,
● Go back to station,

Value Functions- Bellman Equations
Solving RL tasks for WHAT?!
● Finding a policy
○ Achieves lots of reward
■ Over the long RUN!

Dynamic Programming
● Use value functions,
● Organize and structure a search,
● GOOD POLICIES!

Monte Carlo Methods
● Used in algorithm to mimic policy iteration,
○ Policy Evaluation,
■ (s,a) averages over time ==> Q
○ Policy Iteration,
■ Next policy from Q, (Greedy Policy),
● Given s, new policy returns a that max Q(s, . )
● Works in episodic problems ONLY!

References
[1] Reinforcement Learning: Introduction, R. Sutton, A. Barto
[2] AIMA, S. Russell, P. Norvig

What's hot

An introduction to reinforcement learningJie-Han Chen

Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Universitat Politècnica de Catalunya

Deep Q-LearningNikolay Pavlov

Reinforcement LearningSalem-Kabbani

Multi armed banditJie-Han Chen

Reinforcement learningDongHyun Kwak

Reinforcement LearningDongHyun Kwak

Introduction of Deep Reinforcement LearningNAVER Engineering

Frontier in reinforcement learningJie-Han Chen

Reinforcement learning Chandra Meena

Actor critic algorithmJie-Han Chen

Reinforcement Learning - DQNMohammaderfan Arefimoghaddam

Deep reinforcement learning from scratchJie-Han Chen

An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara

Planning and Learning with Tabular MethodsDongmin Lee

Reinforcement Learningbutest

Reinforcement Learning : A Beginners TutorialOmar Enayet

Generalized Reinforcement LearningPo-Hsiang (Barnett) Chiu

Introduction to Deep Reinforcement LearningIDEAS - Int'l Data Engineering and Science Association

Discrete sequential prediction of continuous actions for deep RLJie-Han Chen

What's hot (20)

An introduction to reinforcement learning

Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...

Deep Q-Learning

Reinforcement Learning

Multi armed bandit

Reinforcement learning

Reinforcement Learning

Introduction of Deep Reinforcement Learning

Frontier in reinforcement learning

Reinforcement learning

Actor critic algorithm

Reinforcement Learning - DQN

Deep reinforcement learning from scratch

An Introduction to Reinforcement Learning - The Doors to AGI

Planning and Learning with Tabular Methods

Reinforcement Learning

Reinforcement Learning : A Beginners Tutorial

Generalized Reinforcement Learning

Introduction to Deep Reinforcement Learning

Discrete sequential prediction of continuous actions for deep RL

Viewers also liked

우울증 리서치 개인주연 박

Aula Jonatas 61: AutoridadeAndre Nascimento

Derecho fundamental al proceso_IAFJSRMauri Rojas

Creating a Customer-Centric Learning CultureQualtrics

SlideshareMargie Ortiz Rojas

Fraud Detection Class SlidesMax De Marzi

Video Conferencing over WebRTCYigit UNALLAR

Elastic - DASHYigit UNALLAR

LOAD BEARING WALLwan izzati

Machine LearningJoshua Robinson

Jim rohnMotivational Goldenwords

Retaining WallsMereia Kali

Machine Learninggezeitenraum gbr

ISOBAGS-About Itabrahamprice012

Viewers also liked (14)

우울증 리서치 개인

Aula Jonatas 61: Autoridade

Derecho fundamental al proceso_IAFJSR

Creating a Customer-Centric Learning Culture

Slideshare

Fraud Detection Class Slides

Video Conferencing over WebRTC

Elastic - DASH

LOAD BEARING WALL

Machine Learning

Jim rohn

Retaining Walls

Machine Learning

ISOBAGS-About It

Similar to Reinforcement Learning

Reinforcement LearningCloudxLab

Rl chapter 1 introductionConnorShorten2

Reinforcement Learning 3. Finite Markov Decision ProcessesSeung Jae Lee

Recommender systems Mahmoud Khaled

Simulation To Reality: Reinforcement Learning For Autonomous DrivingDonal Byrne

Structured prediction with reinforcement learningguruprasad110

Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee

Skippon reed & robbins 2013 MSullman

Sequential Decision Making in RecommendationsJaya Kawale

Reinforcement Learning 1. IntroductionSeung Jae Lee

Big Data and algorithmsmichele minno

Introduction to machine learning and applications (1)Manjunath Sindagi

chapter2.pptxjpradha86

Reinforcement Learning for Algorithmic TradingSynaptonIncorporated

W2_Lec03_Lec04_Agents.pptxJavaid Iqbal

Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee

DMIEXPO - Igal Pines - The Intelligent Media Buyer: How can you turn $1 to $2...Morning Dough

Imitation Learning and Direct Perception for Autonomous DrivingRocky Liang

What is Reinforcement Learning in Machine LearningLesa Cote

Intelligent AGent class.pptxAdJamesJohn

Similar to Reinforcement Learning (20)

Reinforcement Learning

Rl chapter 1 introduction

Reinforcement Learning 3. Finite Markov Decision Processes

Recommender systems

Simulation To Reality: Reinforcement Learning For Autonomous Driving

Structured prediction with reinforcement learning

Reinforcement Learning 5. Monte Carlo Methods

Skippon reed & robbins 2013

Sequential Decision Making in Recommendations

Reinforcement Learning 1. Introduction

Big Data and algorithms

Introduction to machine learning and applications (1)

chapter2.pptx

Reinforcement Learning for Algorithmic Trading

W2_Lec03_Lec04_Agents.pptx

Reinforcement Learning 8: Planning and Learning with Tabular Methods

DMIEXPO - Igal Pines - The Intelligent Media Buyer: How can you turn $1 to $2...

Imitation Learning and Direct Perception for Autonomous Driving

What is Reinforcement Learning in Machine Learning

Intelligent AGent class.pptx

Recently uploaded

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

What are the advantages and disadvantages of membrane structures.pptxwendy cai

Current Transformer Drawing and GTP for MSETCLDeelipZope

Artificial-Intelligence-in-Electronics (K).pptxbritheesh05

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE

IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

power system scada applications and usesDevarapalliHaritha

Biology for Computer Engineers Course Handout.pptxDeepakSakkari2

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

microprocessor 8085 and its interfacingjaychoudhary37

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

Heart Disease Prediction using machine learning.pptxPoojaBan

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Recently uploaded (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

What are the advantages and disadvantages of membrane structures.pptx

Current Transformer Drawing and GTP for MSETCL

Artificial-Intelligence-in-Electronics (K).pptx

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...

IVE Industry Focused Event - Defence Sector 2024

SPICE PARK APR2024 ( 6,793 SPICE Models )

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

Microscopic Analysis of Ceramic Materials.pptx

power system scada applications and uses

Biology for Computer Engineers Course Handout.pptx

Software and Systems Engineering Standards: Verification and Validation of Sy...

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

microprocessor 8085 and its interfacing

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

Heart Disease Prediction using machine learning.pptx

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Reinforcement Learning

1. Reinforcement Learning Yigit UNALLAR

2. Machine Learning Learn without explicitly programmed! ● Supervised Learning ● Unsupervised Learning ● Reinforcement Learning

3. Reinforcement Learning ● Learning from interaction! ○ Driving a car, ○ Holding a conversation, ● Goal-directed approach ○ Closed-loop, ○ Reward oriented,

4. Reinforcement vs. Unsupervised Learning ● Hidden structures! ● Unlabeled data! ● No reliance on structures! ● Maximize a reward!

5. Exploration vs. Exploitation Dilemma ● Exploit to obtain rewards! ● Explore to perform better! ● Either Exploration or Exploitation? ● Closest to the human and animal learning!

6. Examples ● Mobile Robot ○ More trash to find, ○ Way back to battery station, ● Adaptive Controller for Petrol Refinery ○ Optimize yield/cost/quality, ○ Specified marginal costs,

7. Agent & Environment ● Policy, ○ Mapping from states to actions, ● Reward, ○ Pain, pleasure, ● Value Function, ○ Farsighted judgement, ● Model, ○ Mimics the environment,

8. Pick and Place Robot Action: Voltages at motors, States: Latest joint data, Reward: +1 for successful pick-up, computed in the environment!

9. Goals & Markov Decision Process Goals: Markov Decision Process: Retaining all relevant information, Markov Property!

10. Markov Decision Process ctd. MDP if, ● The state and action spaces are finite, ● Satisfies Markov property, Example: Recycling Robot ● Actively search for a can, ● Remain still and wait for a can, ● Go back to station,

11. Recycling Robot

12. Value Functions- Bellman Equations Solving RL tasks for WHAT?! ● Finding a policy ○ Achieves lots of reward ■ Over the long RUN!

13. Recycling Robot Revised

14. Dynamic Programming ● Use value functions, ● Organize and structure a search, ● GOOD POLICIES!

15. Dynamic Programming

16. Monte Carlo Methods ● Used in algorithm to mimic policy iteration, ○ Policy Evaluation, ■ (s,a) averages over time ==> Q ○ Policy Iteration, ■ Next policy from Q, (Greedy Policy), ● Given s, new policy returns a that max Q(s, . ) ● Works in episodic problems ONLY!

17. Any Questions?

18. References [1] Reinforcement Learning: Introduction, R. Sutton, A. Barto [2] AIMA, S. Russell, P. Norvig

Reinforcement Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Reinforcement Learning

Similar to Reinforcement Learning (20)

Recently uploaded

Recently uploaded (20)

Reinforcement Learning