SlideShare a Scribd company logo
1
Reinforcement Learning
in The Grid World problem
Author
Alireza Andalib
Learning Machine
2
3
4
5
6
Supervised Learning:
Example Class
Reinforcement Learning:
Situation Reward Situation Reward
…
RL
Supervised Learning
SystemInputs Outputs
Training Info = desired (target) outputs
Error = (target output – actual output)
7
RL
SystemInputs Outputs (“actions”)
Training Info = evaluations (“rewards” / “penalties”)
8
9
10}|Pr{),( ssaaas ttt ===π
11
13
14
15
16
17
Markov Decision Processes
18
Grid World
A B
B’
A’
19
20
Bellman
22
2525
1.7120 9.7461 3.1311 5.4209 1.0036
0.7994 2.9233 2.3299 1.9586 0.4665
0.0023 0.7899 07355 0.4364 0.2287-
0.7664- 0.8488- 0.0076 0.1855- 0.9621-
0.9949- 1.3554- 1.0946- 1.4766- 2.0021-
23
IPE
24
IPE
K100i,j
1.4008 9.5698 3.1841 5.4309 0.8827
0.6503 2.9231 1.9576 1.8581 0.3910
0.0303- 0.8137 0.7354 0.4787 0.2830-
0.4062- 0.0118- 0.0183 0.1828- 0.7333-
0.6535- 0.4780- 0.4594- 0.5763- 0.9488-
25
PI
26
PI
State
Go Right Jump Go Left Jump Go Left
Go Up Go Up Go Left Go Up Go Left
Go Up Go Up Go Up Go Up Go Left
Go Up Go Up Go Up Go Up Go Left
Go Up Go Up Go Up Go Up Go Left
27
28
 Horstmann, Cay. "GridWorld". horstmann.com.
Accessed September 15, 2008
 www.inf.ed.ac.uk/teaching/courses/rl
 www.math-info.univ-paris5.fr/~bouzy/Doc/AA2/Reinforcement
 www.cs.berkeley.edu/~pabbeel/cs287-fa12
 courses.cs.washington.edu/courses/cse473/12sp/s
lides/16-mdp.pdf
29
THANKS FORYOURATTENTION

More Related Content

Similar to Solve Grid world problem

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
Usman Qayyum
 
Application of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimizationApplication of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimizationPranamesh Chakraborty
 
Julia: The language for future
Julia: The language for futureJulia: The language for future
Julia: The language for future
岳華 杜
 
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learningNUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS
 
make simple neural network python
make simple neural network pythonmake simple neural network python
make simple neural network python
아르떼소피
 
Imitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCSImitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCS
Preferred Networks
 
Introduction to MATLAB
Introduction to MATLAB Introduction to MATLAB
Introduction to MATLAB
COMSATS Abbottabad
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia
岳華 杜
 
MachineLearning_QLearningCircuit
MachineLearning_QLearningCircuitMachineLearning_QLearningCircuit
MachineLearning_QLearningCircuitSean Williams
 
Stop Guessing and Start Measuring - Benchmarking Practice (Poly Version)
 Stop Guessing and Start Measuring - Benchmarking Practice (Poly Version) Stop Guessing and Start Measuring - Benchmarking Practice (Poly Version)
Stop Guessing and Start Measuring - Benchmarking Practice (Poly Version)
Tobias Pfeiffer
 
AI and ML 101
AI and ML 101AI and ML 101
AI and ML 101
Rustem Zakiev
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
Rafael Ferreira da Silva
 
Measuring movements of golfers with an accelerometer
Measuring movements of golfers with an accelerometerMeasuring movements of golfers with an accelerometer
Measuring movements of golfers with an accelerometer
Changsu Jung
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
Max Kleiner
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
Jun Young Park
 
Using Redux-Saga for Handling Side Effects
Using Redux-Saga for Handling Side EffectsUsing Redux-Saga for Handling Side Effects
Using Redux-Saga for Handling Side Effects
GlobalLogic Ukraine
 
20190907 Julia the language for future
20190907 Julia the language for future20190907 Julia the language for future
20190907 Julia the language for future
岳華 杜
 
The Ring programming language version 1.7 book - Part 81 of 196
The Ring programming language version 1.7 book - Part 81 of 196The Ring programming language version 1.7 book - Part 81 of 196
The Ring programming language version 1.7 book - Part 81 of 196
Mahmoud Samir Fayed
 
CI L11 Optimization 3 GlobalOptimization.pdf
CI L11 Optimization 3 GlobalOptimization.pdfCI L11 Optimization 3 GlobalOptimization.pdf
CI L11 Optimization 3 GlobalOptimization.pdf
SantiagoGarridoBulln
 

Similar to Solve Grid world problem (20)

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Application of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimizationApplication of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimization
 
Julia: The language for future
Julia: The language for futureJulia: The language for future
Julia: The language for future
 
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learningNUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
 
make simple neural network python
make simple neural network pythonmake simple neural network python
make simple neural network python
 
Imitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCSImitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCS
 
Introduction to MATLAB
Introduction to MATLAB Introduction to MATLAB
Introduction to MATLAB
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia
 
ML .pptx
ML .pptxML .pptx
ML .pptx
 
MachineLearning_QLearningCircuit
MachineLearning_QLearningCircuitMachineLearning_QLearningCircuit
MachineLearning_QLearningCircuit
 
Stop Guessing and Start Measuring - Benchmarking Practice (Poly Version)
 Stop Guessing and Start Measuring - Benchmarking Practice (Poly Version) Stop Guessing and Start Measuring - Benchmarking Practice (Poly Version)
Stop Guessing and Start Measuring - Benchmarking Practice (Poly Version)
 
AI and ML 101
AI and ML 101AI and ML 101
AI and ML 101
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
 
Measuring movements of golfers with an accelerometer
Measuring movements of golfers with an accelerometerMeasuring movements of golfers with an accelerometer
Measuring movements of golfers with an accelerometer
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Using Redux-Saga for Handling Side Effects
Using Redux-Saga for Handling Side EffectsUsing Redux-Saga for Handling Side Effects
Using Redux-Saga for Handling Side Effects
 
20190907 Julia the language for future
20190907 Julia the language for future20190907 Julia the language for future
20190907 Julia the language for future
 
The Ring programming language version 1.7 book - Part 81 of 196
The Ring programming language version 1.7 book - Part 81 of 196The Ring programming language version 1.7 book - Part 81 of 196
The Ring programming language version 1.7 book - Part 81 of 196
 
CI L11 Optimization 3 GlobalOptimization.pdf
CI L11 Optimization 3 GlobalOptimization.pdfCI L11 Optimization 3 GlobalOptimization.pdf
CI L11 Optimization 3 GlobalOptimization.pdf
 

Recently uploaded

Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 

Recently uploaded (20)

Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 

Solve Grid world problem