Learning To Run

Learning To Run
Deep Learning Course
Emanuele Ghelfi Leonardo Arcari Emiliano Gagliardi
https://github.com/MultiBeerBandits/learning-to-run
March 31, 2019
Politecnico di Milano
Our Goal
Our Goal
The goal of this project is to replicate the results of Reason8 team
in the NIPS 2017 Learning To Run competition 1.
• Given a human musculoskeletal model and a physics-based
simulation environment
• Develop a controller that runs as fast as possible
1
https://www.crowdai.org/challenges/nips-2017-learning-to-run
1
Background
Reinforcement Learning
Reinforcement Learning (RL) deals with sequential decision making
problems. At each timestep the agent observes the world state,
selects an action and receives a reward.
πs a
Agent
r
∼  (⋅ ∣ s, a)s
′
Goal: Maximize the expected discounted sum of rewards:
Jπ = E
[∑H
t=0 γtr(st, at)
]
.
2
Deep Reinforcement Learning
The policy πθ is encoded in a neural network with weights θ.
s a
Agent
r
(a ∣ s)πθ
∼  (⋅ ∣ s, a)s
′
How? Gradient ascent over policy parameters: θ′ = θ + η∇θJπ
(Policy gradient theorem).
3
Learning To Run
Learning To Run
s ∈ ℝ
34
(s)πθ
a ∈ [0, 1]
18
∼  (⋅ ∣ s, a)s
′
• State space represents kinematic quantities of joints and links.
• Actions represents muscles activations.
• Reward is proportional to the speed of the body. A penalization is given
when the pelvis height is below a threshold, and the episode restarts. 4
Deep Deterministic Policy Gradient - DDPG
• State of the art algorithm in Deep Reinforcement Learning.
• Off-policy.
• Actor-critic method.
• Combines in an effective way Deterministic Policy Gradient
(DPG) and Deep Q-Network (DQN).
5
Deep Deterministic Policy Gradient - DDPG
Main characteristics of DDPG:
• Deterministic actor π(s) : S → A.
• Replay Buffer to solve the sample independence problem while
training.
• Separated target networks with soft-updates to improve
convergence stability.
6
DDPG Improvements
We implemented several improvements over vanilla DDPG:
• Parameter noise (with layer normalization) and action noise to
improve exploration.
• State and action flip (data augmentation).
• Relative Positions (feature engineering).
7
DDPG Improvements
Dispatch sampling
jobs
Samples
ready
no
yes
Train
Store in replay
buffer
Dispatch
evaluation job
Evaluation 
ready
no
yes
Display statistics
Time expired
no
yes
Sampling workers
dispatch
Testing workers
dispatch
Replay buffer
dispatch
8
DDPG Improvements
yes
no
yes
Sampling workers Testing workers
Replay buffer
dispatch
Actori
s a
πθi
9
Results
Results - Thread number impact
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35Distance(m)
20 Threads
10 Threads
10
Results - Ablation study
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35
Distance(m)
Flip - PN
Flip - No PN
No Flip - PN
No Flip - No PN
0 2 16 18 69 71 74 97
Training time (h)
11
Thank you all!
11
Backup slides
Results - Full state vs Reduced State
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35Distance(m)
reduced
full
12
Actor-Critic networks
Elu Elu σ
s ∈ ℝ
34
64 64 a ∈ [0, 1]
18
T anh T anh Linear
64 32
a ∈ [0, 1]
18
s ∈ ℝ
34
Actor Critic
1
13
1 of 20

Recommended

Optimization in deep learning by
Optimization in deep learningOptimization in deep learning
Optimization in deep learningRakshith Sathish
159 views20 slides
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration by
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn
811 views18 slides
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments by
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsMulti PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsJisang Yoon
255 views31 slides
Competition winning learning rates by
Competition winning learning ratesCompetition winning learning rates
Competition winning learning ratesMLconf
973 views19 slides
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017 by
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
924 views28 slides
Reinforcement Learning by
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
1.4K views61 slides

More Related Content

What's hot

0415_seminar_DeepDPG by
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPGHye-min Ahn
899 views23 slides
An introduction to deep reinforcement learning by
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
5.1K views53 slides
Lecture 7.3 bt by
Lecture 7.3 btLecture 7.3 bt
Lecture 7.3 btbtmathematics
229 views25 slides
An Introduction to Reinforcement Learning - The Doors to AGI by
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
386 views27 slides
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz... by
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...Uday Haral
93 views1 slide
An introduction to reinforcement learning by
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
4.4K views30 slides

What's hot(9)

0415_seminar_DeepDPG by Hye-min Ahn
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPG
Hye-min Ahn899 views
An introduction to deep reinforcement learning by Big Data Colombia
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia5.1K views
An Introduction to Reinforcement Learning - The Doors to AGI by Anirban Santara
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
Anirban Santara386 views
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz... by Uday Haral
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
Uday Haral93 views
An introduction to reinforcement learning by Subrat Panda, PhD
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
Subrat Panda, PhD4.4K views
Frontier in reinforcement learning by Jie-Han Chen
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learning
Jie-Han Chen787 views
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015 by Chris Ohk
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Chris Ohk1.1K views
Kernel, RKHS, and Gaussian Processes by Sungjoon Choi
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
Sungjoon Choi2.4K views

Similar to Learning To Run

Reinforcement learning by
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
927 views51 slides
Introduction of Deep Reinforcement Learning by
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
5.3K views61 slides
B4UConference_machine learning_deeplearning by
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
1.3K views69 slides
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018 by
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Universitat Politècnica de Catalunya
1.2K views57 slides
DDPG algortihm for angry birds by
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birdsWangyu Han
189 views56 slides
Efficient aggregation for graph summarization by
Efficient aggregation for graph summarizationEfficient aggregation for graph summarization
Efficient aggregation for graph summarizationaftab alam
106 views19 slides

Similar to Learning To Run(20)

Introduction of Deep Reinforcement Learning by NAVER Engineering
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering5.3K views
B4UConference_machine learning_deeplearning by Hoa Le
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
Hoa Le1.3K views
DDPG algortihm for angry birds by Wangyu Han
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
Wangyu Han189 views
Efficient aggregation for graph summarization by aftab alam
Efficient aggregation for graph summarizationEfficient aggregation for graph summarization
Efficient aggregation for graph summarization
aftab alam106 views
Deep Q-learning from Demonstrations DQfD by Ammar Rashed
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
Ammar Rashed252 views
Rohan's Masters presentation by rohan_anil
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
rohan_anil419 views
consistency regularization for generative adversarial networks_review by Yoonho Na
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_review
Yoonho Na15 views
Transfer Learning: Breve introducción a modelos pre-entrenados. by Fernando Constantino
Transfer Learning: Breve introducción a modelos pre-entrenados.Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.
Using SigOpt to Tune Deep Learning Models with Nervana Cloud by SigOpt
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
SigOpt1K views
Learning visual representation without human label by Kai-Wen Zhao
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
Kai-Wen Zhao1.1K views
The Frontier of Deep Learning in 2020 and Beyond by NUS-ISS
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
NUS-ISS168 views
Intro to Deep Reinforcement Learning by Khaled Saleh
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Khaled Saleh805 views
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego by DEVCON
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoApplying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
DEVCON 1.4K views
Imitation Learning for Autonomous Driving in TORCS by Preferred Networks
Imitation Learning for Autonomous Driving in TORCSImitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCS
Preferred Networks3.3K views
Online advertising and large scale model fitting by Wush Wu
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu4.3K views

Recently uploaded

Renewal Projects in Seismic Construction by
Renewal Projects in Seismic ConstructionRenewal Projects in Seismic Construction
Renewal Projects in Seismic ConstructionEngineering & Seismic Construction
6 views8 slides
GDSC Mikroskil Members Onboarding 2023.pdf by
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdfgdscmikroskil
68 views62 slides
unit 1.pptx by
unit 1.pptxunit 1.pptx
unit 1.pptxrrbornarecm
5 views53 slides
Plant Design Report-Oil Refinery.pdf by
Plant Design Report-Oil Refinery.pdfPlant Design Report-Oil Refinery.pdf
Plant Design Report-Oil Refinery.pdfSafeen Yaseen Ja'far
9 views10 slides
Global airborne satcom market report by
Global airborne satcom market reportGlobal airborne satcom market report
Global airborne satcom market reportdefencereport78
7 views13 slides
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth by
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for GrowthInnomantra
20 views4 slides

Recently uploaded(20)

GDSC Mikroskil Members Onboarding 2023.pdf by gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil68 views
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth by Innomantra
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth
Innomantra 20 views
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang78188 views
Web Dev Session 1.pptx by VedVekhande
Web Dev Session 1.pptxWeb Dev Session 1.pptx
Web Dev Session 1.pptx
VedVekhande20 views
MongoDB.pdf by ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR351 views
Design_Discover_Develop_Campaign.pptx by ShivanshSeth6
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptx
ShivanshSeth655 views
Créativité dans le design mécanique à l’aide de l’optimisation topologique by LIEGE CREATIVE
Créativité dans le design mécanique à l’aide de l’optimisation topologiqueCréativité dans le design mécanique à l’aide de l’optimisation topologique
Créativité dans le design mécanique à l’aide de l’optimisation topologique
LIEGE CREATIVE8 views
Unlocking Research Visibility.pdf by KhatirNaima
Unlocking Research Visibility.pdfUnlocking Research Visibility.pdf
Unlocking Research Visibility.pdf
KhatirNaima11 views
Design of machine elements-UNIT 3.pptx by gopinathcreddy
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptx
gopinathcreddy38 views
Integrating Sustainable Development Goals (SDGs) in School Education by SheetalTank1
Integrating Sustainable Development Goals (SDGs) in School EducationIntegrating Sustainable Development Goals (SDGs) in School Education
Integrating Sustainable Development Goals (SDGs) in School Education
SheetalTank111 views
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf by AlhamduKure
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
AlhamduKure10 views

Learning To Run

  • 1. Learning To Run Deep Learning Course Emanuele Ghelfi Leonardo Arcari Emiliano Gagliardi https://github.com/MultiBeerBandits/learning-to-run March 31, 2019 Politecnico di Milano
  • 3. Our Goal The goal of this project is to replicate the results of Reason8 team in the NIPS 2017 Learning To Run competition 1. • Given a human musculoskeletal model and a physics-based simulation environment • Develop a controller that runs as fast as possible 1 https://www.crowdai.org/challenges/nips-2017-learning-to-run 1
  • 5. Reinforcement Learning Reinforcement Learning (RL) deals with sequential decision making problems. At each timestep the agent observes the world state, selects an action and receives a reward. πs a Agent r ∼  (⋅ ∣ s, a)s ′ Goal: Maximize the expected discounted sum of rewards: Jπ = E [∑H t=0 γtr(st, at) ] . 2
  • 6. Deep Reinforcement Learning The policy πθ is encoded in a neural network with weights θ. s a Agent r (a ∣ s)πθ ∼  (⋅ ∣ s, a)s ′ How? Gradient ascent over policy parameters: θ′ = θ + η∇θJπ (Policy gradient theorem). 3
  • 8. Learning To Run s ∈ ℝ 34 (s)πθ a ∈ [0, 1] 18 ∼  (⋅ ∣ s, a)s ′ • State space represents kinematic quantities of joints and links. • Actions represents muscles activations. • Reward is proportional to the speed of the body. A penalization is given when the pelvis height is below a threshold, and the episode restarts. 4
  • 9. Deep Deterministic Policy Gradient - DDPG • State of the art algorithm in Deep Reinforcement Learning. • Off-policy. • Actor-critic method. • Combines in an effective way Deterministic Policy Gradient (DPG) and Deep Q-Network (DQN). 5
  • 10. Deep Deterministic Policy Gradient - DDPG Main characteristics of DDPG: • Deterministic actor π(s) : S → A. • Replay Buffer to solve the sample independence problem while training. • Separated target networks with soft-updates to improve convergence stability. 6
  • 11. DDPG Improvements We implemented several improvements over vanilla DDPG: • Parameter noise (with layer normalization) and action noise to improve exploration. • State and action flip (data augmentation). • Relative Positions (feature engineering). 7
  • 12. DDPG Improvements Dispatch sampling jobs Samples ready no yes Train Store in replay buffer Dispatch evaluation job Evaluation  ready no yes Display statistics Time expired no yes Sampling workers dispatch Testing workers dispatch Replay buffer dispatch 8
  • 13. DDPG Improvements yes no yes Sampling workers Testing workers Replay buffer dispatch Actori s a πθi 9
  • 15. Results - Thread number impact 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35Distance(m) 20 Threads 10 Threads 10
  • 16. Results - Ablation study 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35 Distance(m) Flip - PN Flip - No PN No Flip - PN No Flip - No PN 0 2 16 18 69 71 74 97 Training time (h) 11
  • 19. Results - Full state vs Reduced State 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35Distance(m) reduced full 12
  • 20. Actor-Critic networks Elu Elu σ s ∈ ℝ 34 64 64 a ∈ [0, 1] 18 T anh T anh Linear 64 32 a ∈ [0, 1] 18 s ∈ ℝ 34 Actor Critic 1 13