SlideShare a Scribd company logo
Imitation  Learning  for
Autonomous  Driving  in  TORCS
Final Report
Yasunori Kudo
Mitsuru Kusumoto, Yasuhiro Fujita
SP Team
Imitation Learning
Imitation  Learning  is  an  approach  for  the  sequential  prediction
problem,  where  expert  demonstrations  of  good  behavior  are  
used  to  learn  a  controller.
In  standard  reinforcement  learning,  agents  need  to  explore  the
environment  many  times  to  obtain  a  good  policy.  However,  sample
efficiency  is  crucial  in  actual  environments.
Expert  demonstrations  may  be  helpful  for  this  issue.
Examples  :
• Legged  locomotion  [Ratliff  2006]
• Outdoor  navigation  [Silver  2008]
• Car  driving  [Pomerleau 1989]
• Helicopter  flight  [Abbeel 2007]  
Where we’ll go
DAgger : Dataset Aggregation
DAGGER3.6. DATASET AGGREGATION: ITERATIVE INTERACTIVE LEARNING
APPROACH 69
Execute current policy and Query Expert 
New Data 
Supervised Learning 
All previous data 
Aggregate 
Dataset 
Steering 
from expert 
New 
Policy 
Stéphane.  Ross,  Geoffrey  J.  Gordon,  and  J.  Andrew.  Bagnell.  A  reduction  of  imitation
learning  and  structured  prediction  to  no-‑regret  online  learning.  In  AISTATS,  2011.
DAgger : Dataset Aggregation
DAGGER
70 CHAPTER 3. LEARNING BEHAVIOR FROM DEMONSTRATIONS
Initialize D ;.
Initialize ˆ⇡1 to any policy in ⇧.
for i = 1 to N do
Let ⇡i = i⇡⇤ + (1 i)ˆ⇡i.
Sample T-step trajectories using ⇡i.
Get dataset Di = {(s, ⇡⇤(s))} of visited states by ⇡i and actions given by expert.
Aggregate datasets: D D
S
Di.
Train classifier ˆ⇡i+1 on D (or use online learner to get ˆ⇡i+1 given new data Di).
end for
Return best ˆ⇡i on validation.
Algorithm 3.6.1: DAGGER Algorithm.
algorithm described above is the special case i = I(i = 1) for I the indicator function,
which often performs best in practice (see Chapter 5). The general DAGGER algorithm
is detailed in Algorithm 3.6.1.
!
"∗
(%, !)    !
DAgger: Dataset Aggregation
• Collect new trajectories with 1
• New Dataset D1’ = {(s, *(s))}
• Aggregate Datasets:
D1 = D0 U D1’
• Train 2 on D1
17
1
Steering from
expert
DAgger: Dataset Aggregation
2
• Collect new trajectories with 2
• New Dataset D2’ = {(s, *(s))}
• Aggregate Datasets:
D2 = D1 U D2’
• Train 3 on D2
18
Steering from
expert
Expert  policy Predicted  policy
Avoid  to  collect  states  affected
by  only  expert  policy
Experiments
• Pendulum  and  Pong  in  OpenAI Gym
• We  compared  the  performance  of  DAgger
with  standard  RL  algorithm.
REINFORCE
m: Pendulum Swingup
benchmark task
control:
rque
From “Reinforcement Learning In Continuous Time and
Space”, Kenji Doya, 2000State  :  (θ,  θ)
Reward  :  cosθ
・
State  :  80×80  binary
Reward  :  win  +1,  lose  -‑1
Experiments - REINFORCE
REward Increment  =  Nonnegative  Factor  × Offset  Reinforcement
× Characteristic  Eligibility
!!   
*) = *. ∪ *)′
!! (1 *)
∇34 3 =
1
6
789)
:;,8 − =
>
8?)
∇-3 log ( &;,8|$;,8; 3
>
8?)
E
;?)
38F)
= 38
+ H∇34 3
*)
789)
:;,8 − =
>
8?)
∇-3 log ( &;,8|$;,8; 3
>
8?)
E
?)
38F)
= 38
+ H∇34 3
Ronald J.  Williams.  Simple  statistical  gradient-‑following  algorithms  for
connectionist  reinforcement  learning.  Machine Learning,  8(3):229-‑256,  1992.
• Predict  gradient
• Update  model  parameter
!
"
#
$
%
&
!
"
#
$
%
&
'
(
)
  
:  Model  parameter
:  Number  of  episode
:  Number  of  step
:  Decay  of  reward
:  Reward
:  Baseline
:  Policy
:  Action
:  State
Experiments ‒ Multi Agent
http://192.168.0.1/8080
http://192.168.0.2/8080
http://192.168.0.3/8080
experience
environment
experience
environment
experience
environment
gradient
gradient
gradient
model
parameter
model
parameter
model
parameter
update
Training  speed  is  about  3  times  faster  than
single  agent.
3  agents
Results - Pendulum
REINFORCE
DAgger
3  Layers  Perceptron
3
200
2
input
(cosθ,  sinθ,  θ)
.
output
or
Less  episodes  until  convergence  !
Results - Pong
REINFORCE
DAgger
3  Layers  Perceptron
6400
200
2
input
6400  (80x80)  vector
output
Up  or  Down
Validation  accuracy  :  97.04%
= ー
S St+1 t
Less  episodes  until  convergence  !
Application to TORCS
7  training  tracks
track4 track7 track18
3  test  tracks
track8 track12 track16
…• Car  driving  simulator  game
• Try  to  improve  Yoshida-‑sanʼ’s  projects
• Train  policy  only  from  vision  sensor
Imitation  Learning
(expert  :  hand-‑crafted  AI)
Reinforcement  Learning
+CNCGAKCFMELAF
GKGKK
=DKGK
IIGGLKGK
8IKGKL=
CGIEC=GEFNCKCGKGK
,AAPIL(G=L0  S-‑0.+-‑
/I
/I
/I
/I
/I(
xt-‑1
xt
3×64×64
• Steering  wheel  :  (-‑1,  0,  1)
• Whether  to  brake  :  (0,  1)
• Steering  wheel  :  -‑1  ~∼ 1
• Accel :  0  ~∼  1
or
Discrete  actions
Continuous  actions
Transfer  Learning
Results ‒ DAgger in TORCS
Discrete  actions Continuous  actions
• DAgger works  well  in  different  environments(no  overfitting!).
• The  agent  cannot  surpass  the  performance  of  the  expert  :  Most  places  
where  an  agent  fails  are  where  the  expert  fails.  
• The  expert  cannot  reach  the  goal  in  all  test  tracks.
• An  agent  with  continuous  actions  gradually  become  worse...
Expert  can  reach
Experiments ‒Transfer Learning
• Experiment  1  (single-‑play)  -‑ RL  for  faster  and  safer  driving
• Experiment  2  (self-‑play)  -‑ RL  for  racing  battle
Rewards
Out  of  the  tracks  ⇒ -1
Every  400(track  0)  or  200(track  8)  steps  ⇒ mean speed
Environments
Track  0  and  16
Rewards
Out  of  the  tracks  ⇒ -1
Overtaken  by  the  opponent  ⇒ -1
Overtake  the  opponent ⇒ mean speed
Environment
Track  0 32
32
64
64
Input
Input
(≒0  ~∼  2.2)
(≒0  ~∼  2.2)
Results - Experiment 1 (single-play)
Track  0  (Goal  :  400  steps) Track  16  (Goal  :  1600  steps)
• Transfer  learning  works  well  in  REINFORCE  algorithm.
• Better  driving  than  expert in  terms  of  both  speed  and  safety.
• An  agent  trained  well  seems  to  control  speed
by  steering  action  only  (no  braking).
Expert
Moving Average
Results - Experiment 2 (self-play)
Opponent
=  ExpertAgent Opponent Agent OpponentAgent
vs.  expert self-‑play1 self-‑play  2
Moving  Average
RL  not  to  be  overtaken RL  to  overtake RL  not  to  be  overtaken
Conclusion and Future Works
• DAgger works  well  in  various  environments  such  as  TORCS.
• DAgger is  very  effective  as  pre-‑training  before  RL.
• Imitation  Learning  as  pre-‑training  cause  to  get  stuck  in  local  minima ?
• Multi-‑task  learning  (predicting  existence  of  another  car  to  the  left
or  right  at  the  same  time)  could  help  to  train  autonomous  driving  ?
Future  Works
Conclusion
With  baselines
Without  baselines
With  pre-‑training
Without  pre-‑training
Appendix
Comparison  of  baselines Comparison  of  pre-‑training  by  DAgger

More Related Content

What's hot

Artificial Intelligence - Hill climbing.
Artificial Intelligence - Hill climbing.Artificial Intelligence - Hill climbing.
Artificial Intelligence - Hill climbing.
StephenTec
 
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
vikas dhakane
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
Seolhokim
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorial
Yisong Yue
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed search
Tekendra Nath Yogi
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia
 
A Review Article on Fixed Point Theory and Its Application
A Review Article on Fixed Point Theory and Its ApplicationA Review Article on Fixed Point Theory and Its Application
A Review Article on Fixed Point Theory and Its Application
ijtsrd
 
Major case study presentation
Major case study presentationMajor case study presentation
Major case study presentation
Lauren Wathen
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Salem-Kabbani
 
A Mathematical Introduction to Robotic Manipulation 輪講 第三回.pdf
A Mathematical Introduction to Robotic Manipulation 輪講 第三回.pdfA Mathematical Introduction to Robotic Manipulation 輪講 第三回.pdf
A Mathematical Introduction to Robotic Manipulation 輪講 第三回.pdf
ssuserbaad54
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
Somnath Banerjee
 
المحاضرة 1- Historical introduction
المحاضرة  1- Historical introduction المحاضرة  1- Historical introduction
المحاضرة 1- Historical introduction
سامي الذبحاوي
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
CloudxLab
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
Sneha Ravikumar
 
Kernel Method
Kernel MethodKernel Method
Kernel Method
수철 박
 
Popular search algorithms
Popular search algorithmsPopular search algorithms
Popular search algorithms
Minakshi Atre
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
Seung Jae Lee
 
Controlled dropout: a different dropout for improving training speed on deep ...
Controlled dropout: a different dropout for improving training speed on deep ...Controlled dropout: a different dropout for improving training speed on deep ...
Controlled dropout: a different dropout for improving training speed on deep ...
Byung Soo Ko
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
Kai-Wen Zhao
 
Pso introduction
Pso introductionPso introduction
Pso introduction
rutika12345
 

What's hot (20)

Artificial Intelligence - Hill climbing.
Artificial Intelligence - Hill climbing.Artificial Intelligence - Hill climbing.
Artificial Intelligence - Hill climbing.
 
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorial
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed search
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
A Review Article on Fixed Point Theory and Its Application
A Review Article on Fixed Point Theory and Its ApplicationA Review Article on Fixed Point Theory and Its Application
A Review Article on Fixed Point Theory and Its Application
 
Major case study presentation
Major case study presentationMajor case study presentation
Major case study presentation
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
A Mathematical Introduction to Robotic Manipulation 輪講 第三回.pdf
A Mathematical Introduction to Robotic Manipulation 輪講 第三回.pdfA Mathematical Introduction to Robotic Manipulation 輪講 第三回.pdf
A Mathematical Introduction to Robotic Manipulation 輪講 第三回.pdf
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
 
المحاضرة 1- Historical introduction
المحاضرة  1- Historical introduction المحاضرة  1- Historical introduction
المحاضرة 1- Historical introduction
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
 
Kernel Method
Kernel MethodKernel Method
Kernel Method
 
Popular search algorithms
Popular search algorithmsPopular search algorithms
Popular search algorithms
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
Controlled dropout: a different dropout for improving training speed on deep ...
Controlled dropout: a different dropout for improving training speed on deep ...Controlled dropout: a different dropout for improving training speed on deep ...
Controlled dropout: a different dropout for improving training speed on deep ...
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Pso introduction
Pso introductionPso introduction
Pso introduction
 

Viewers also liked

Response Summarizer: An Automatic Summarization System of Call Center Convers...
Response Summarizer: An Automatic Summarization System of Call Center Convers...Response Summarizer: An Automatic Summarization System of Call Center Convers...
Response Summarizer: An Automatic Summarization System of Call Center Convers...
Preferred Networks
 
Automatically Fusing Functions on CuPy
Automatically Fusing Functions on CuPyAutomatically Fusing Functions on CuPy
Automatically Fusing Functions on CuPy
Preferred Networks
 
対話における商品の営業
対話における商品の営業対話における商品の営業
対話における商品の営業
Preferred Networks
 
Generation of 3D-avatar animation from latent representations
Generation of 3D-avatar animation from latent representationsGeneration of 3D-avatar animation from latent representations
Generation of 3D-avatar animation from latent representations
Preferred Networks
 
Bayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix FactorizationBayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix Factorization
Preferred Networks
 
3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks
Preferred Networks
 
DQN with Differentiable Memory Architectures
DQN with Differentiable Memory ArchitecturesDQN with Differentiable Memory Architectures
DQN with Differentiable Memory Architectures
Preferred Networks
 
Anomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAEAnomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAE
Preferred Networks
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of Chainer
Kenta Oono
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
Yahoo!デベロッパーネットワーク
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
Yusuke HIDESHIMA
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
nlab_utokyo
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
Seiya Tokui
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門
Yuya Unno
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA Japan
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみた
samacoba1983
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
Yuta Kashino
 
ディープラーニングと自動運転、コネクティッドカー @ TU-Automotive 2016
ディープラーニングと自動運転、コネクティッドカー @ TU-Automotive 2016ディープラーニングと自動運転、コネクティッドカー @ TU-Automotive 2016
ディープラーニングと自動運転、コネクティッドカー @ TU-Automotive 2016
Preferred Networks
 
IPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へIPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へ
Preferred Networks
 
実世界の人工知能 〜交通,製造業,バイオヘルスケア〜
実世界の人工知能 〜交通,製造業,バイオヘルスケア〜実世界の人工知能 〜交通,製造業,バイオヘルスケア〜
実世界の人工知能 〜交通,製造業,バイオヘルスケア〜
Preferred Networks
 

Viewers also liked (20)

Response Summarizer: An Automatic Summarization System of Call Center Convers...
Response Summarizer: An Automatic Summarization System of Call Center Convers...Response Summarizer: An Automatic Summarization System of Call Center Convers...
Response Summarizer: An Automatic Summarization System of Call Center Convers...
 
Automatically Fusing Functions on CuPy
Automatically Fusing Functions on CuPyAutomatically Fusing Functions on CuPy
Automatically Fusing Functions on CuPy
 
対話における商品の営業
対話における商品の営業対話における商品の営業
対話における商品の営業
 
Generation of 3D-avatar animation from latent representations
Generation of 3D-avatar animation from latent representationsGeneration of 3D-avatar animation from latent representations
Generation of 3D-avatar animation from latent representations
 
Bayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix FactorizationBayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix Factorization
 
3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks
 
DQN with Differentiable Memory Architectures
DQN with Differentiable Memory ArchitecturesDQN with Differentiable Memory Architectures
DQN with Differentiable Memory Architectures
 
Anomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAEAnomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAE
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of Chainer
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみた
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
 
ディープラーニングと自動運転、コネクティッドカー @ TU-Automotive 2016
ディープラーニングと自動運転、コネクティッドカー @ TU-Automotive 2016ディープラーニングと自動運転、コネクティッドカー @ TU-Automotive 2016
ディープラーニングと自動運転、コネクティッドカー @ TU-Automotive 2016
 
IPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へIPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へ
 
実世界の人工知能 〜交通,製造業,バイオヘルスケア〜
実世界の人工知能 〜交通,製造業,バイオヘルスケア〜実世界の人工知能 〜交通,製造業,バイオヘルスケア〜
実世界の人工知能 〜交通,製造業,バイオヘルスケア〜
 

Similar to Imitation Learning for Autonomous Driving in TORCS

Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
SigOpt
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
Anirban Santara
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
Ammar Rashed
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in Informatica
Luca Marignati
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAI
Anand Joshi
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
재연 윤
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
Wangyu Han
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
DongHyun Kwak
 
Learning To Run
Learning To RunLearning To Run
Learning To Run
Emanuele Ghelfi
 
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
Jisang Yoon
 
PPT - Discovering Reinforcement Learning Algorithms
PPT - Discovering Reinforcement Learning AlgorithmsPPT - Discovering Reinforcement Learning Algorithms
PPT - Discovering Reinforcement Learning Algorithms
Jisang Yoon
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
DongHyun Kwak
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Khaled Saleh
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
Ben Ball
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
MLconf
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
dinesh malla
 

Similar to Imitation Learning for Autonomous Driving in TORCS (20)

Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in Informatica
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAI
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Learning To Run
Learning To RunLearning To Run
Learning To Run
 
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
 
PPT - Discovering Reinforcement Learning Algorithms
PPT - Discovering Reinforcement Learning AlgorithmsPPT - Discovering Reinforcement Learning Algorithms
PPT - Discovering Reinforcement Learning Algorithms
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
 

More from Preferred Networks

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
Preferred Networks
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Preferred Networks
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Preferred Networks
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
Preferred Networks
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Preferred Networks
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Preferred Networks
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
Preferred Networks
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Preferred Networks
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
Preferred Networks
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Preferred Networks
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
Preferred Networks
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
Preferred Networks
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Preferred Networks
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Preferred Networks
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
Preferred Networks
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
Preferred Networks
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Preferred Networks
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
Preferred Networks
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
Preferred Networks
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
Preferred Networks
 

More from Preferred Networks (20)

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
 

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 

Imitation Learning for Autonomous Driving in TORCS

  • 1. Imitation  Learning  for Autonomous  Driving  in  TORCS Final Report Yasunori Kudo Mitsuru Kusumoto, Yasuhiro Fujita SP Team
  • 2. Imitation Learning Imitation  Learning  is  an  approach  for  the  sequential  prediction problem,  where  expert  demonstrations  of  good  behavior  are   used  to  learn  a  controller. In  standard  reinforcement  learning,  agents  need  to  explore  the environment  many  times  to  obtain  a  good  policy.  However,  sample efficiency  is  crucial  in  actual  environments. Expert  demonstrations  may  be  helpful  for  this  issue. Examples  : • Legged  locomotion  [Ratliff  2006] • Outdoor  navigation  [Silver  2008] • Car  driving  [Pomerleau 1989] • Helicopter  flight  [Abbeel 2007]   Where we’ll go
  • 3. DAgger : Dataset Aggregation DAGGER3.6. DATASET AGGREGATION: ITERATIVE INTERACTIVE LEARNING APPROACH 69 Execute current policy and Query Expert  New Data  Supervised Learning  All previous data  Aggregate  Dataset  Steering  from expert  New  Policy  Stéphane.  Ross,  Geoffrey  J.  Gordon,  and  J.  Andrew.  Bagnell.  A  reduction  of  imitation learning  and  structured  prediction  to  no-‑regret  online  learning.  In  AISTATS,  2011.
  • 4. DAgger : Dataset Aggregation DAGGER 70 CHAPTER 3. LEARNING BEHAVIOR FROM DEMONSTRATIONS Initialize D ;. Initialize ˆ⇡1 to any policy in ⇧. for i = 1 to N do Let ⇡i = i⇡⇤ + (1 i)ˆ⇡i. Sample T-step trajectories using ⇡i. Get dataset Di = {(s, ⇡⇤(s))} of visited states by ⇡i and actions given by expert. Aggregate datasets: D D S Di. Train classifier ˆ⇡i+1 on D (or use online learner to get ˆ⇡i+1 given new data Di). end for Return best ˆ⇡i on validation. Algorithm 3.6.1: DAGGER Algorithm. algorithm described above is the special case i = I(i = 1) for I the indicator function, which often performs best in practice (see Chapter 5). The general DAGGER algorithm is detailed in Algorithm 3.6.1. ! "∗ (%, !)   ! DAgger: Dataset Aggregation • Collect new trajectories with 1 • New Dataset D1’ = {(s, *(s))} • Aggregate Datasets: D1 = D0 U D1’ • Train 2 on D1 17 1 Steering from expert DAgger: Dataset Aggregation 2 • Collect new trajectories with 2 • New Dataset D2’ = {(s, *(s))} • Aggregate Datasets: D2 = D1 U D2’ • Train 3 on D2 18 Steering from expert Expert  policy Predicted  policy Avoid  to  collect  states  affected by  only  expert  policy
  • 5. Experiments • Pendulum  and  Pong  in  OpenAI Gym • We  compared  the  performance  of  DAgger with  standard  RL  algorithm. REINFORCE m: Pendulum Swingup benchmark task control: rque From “Reinforcement Learning In Continuous Time and Space”, Kenji Doya, 2000State  :  (θ,  θ) Reward  :  cosθ ・ State  :  80×80  binary Reward  :  win  +1,  lose  -‑1
  • 6. Experiments - REINFORCE REward Increment  =  Nonnegative  Factor  × Offset  Reinforcement × Characteristic  Eligibility !!   *) = *. ∪ *)′ !! (1 *) ∇34 3 = 1 6 789) :;,8 − = > 8?) ∇-3 log ( &;,8|$;,8; 3 > 8?) E ;?) 38F) = 38 + H∇34 3 *) 789) :;,8 − = > 8?) ∇-3 log ( &;,8|$;,8; 3 > 8?) E ?) 38F) = 38 + H∇34 3 Ronald J.  Williams.  Simple  statistical  gradient-‑following  algorithms  for connectionist  reinforcement  learning.  Machine Learning,  8(3):229-‑256,  1992. • Predict  gradient • Update  model  parameter ! " # $ % & ! " # $ % & ' ( )   :  Model  parameter :  Number  of  episode :  Number  of  step :  Decay  of  reward :  Reward :  Baseline :  Policy :  Action :  State
  • 7. Experiments ‒ Multi Agent http://192.168.0.1/8080 http://192.168.0.2/8080 http://192.168.0.3/8080 experience environment experience environment experience environment gradient gradient gradient model parameter model parameter model parameter update Training  speed  is  about  3  times  faster  than single  agent. 3  agents
  • 8. Results - Pendulum REINFORCE DAgger 3  Layers  Perceptron 3 200 2 input (cosθ,  sinθ,  θ) . output or Less  episodes  until  convergence  !
  • 9. Results - Pong REINFORCE DAgger 3  Layers  Perceptron 6400 200 2 input 6400  (80x80)  vector output Up  or  Down Validation  accuracy  :  97.04% = ー S St+1 t Less  episodes  until  convergence  !
  • 10. Application to TORCS 7  training  tracks track4 track7 track18 3  test  tracks track8 track12 track16 …• Car  driving  simulator  game • Try  to  improve  Yoshida-‑sanʼ’s  projects • Train  policy  only  from  vision  sensor Imitation  Learning (expert  :  hand-‑crafted  AI) Reinforcement  Learning +CNCGAKCFMELAF GKGKK =DKGK IIGGLKGK 8IKGKL= CGIEC=GEFNCKCGKGK ,AAPIL(G=L0  S-‑0.+-‑ /I /I /I /I /I( xt-‑1 xt 3×64×64 • Steering  wheel  :  (-‑1,  0,  1) • Whether  to  brake  :  (0,  1) • Steering  wheel  :  -‑1  ~∼ 1 • Accel :  0  ~∼  1 or Discrete  actions Continuous  actions Transfer  Learning
  • 11. Results ‒ DAgger in TORCS Discrete  actions Continuous  actions • DAgger works  well  in  different  environments(no  overfitting!). • The  agent  cannot  surpass  the  performance  of  the  expert  :  Most  places   where  an  agent  fails  are  where  the  expert  fails.   • The  expert  cannot  reach  the  goal  in  all  test  tracks. • An  agent  with  continuous  actions  gradually  become  worse... Expert  can  reach
  • 12. Experiments ‒Transfer Learning • Experiment  1  (single-‑play)  -‑ RL  for  faster  and  safer  driving • Experiment  2  (self-‑play)  -‑ RL  for  racing  battle Rewards Out  of  the  tracks  ⇒ -1 Every  400(track  0)  or  200(track  8)  steps  ⇒ mean speed Environments Track  0  and  16 Rewards Out  of  the  tracks  ⇒ -1 Overtaken  by  the  opponent  ⇒ -1 Overtake  the  opponent ⇒ mean speed Environment Track  0 32 32 64 64 Input Input (≒0  ~∼  2.2) (≒0  ~∼  2.2)
  • 13. Results - Experiment 1 (single-play) Track  0  (Goal  :  400  steps) Track  16  (Goal  :  1600  steps) • Transfer  learning  works  well  in  REINFORCE  algorithm. • Better  driving  than  expert in  terms  of  both  speed  and  safety. • An  agent  trained  well  seems  to  control  speed by  steering  action  only  (no  braking). Expert Moving Average
  • 14. Results - Experiment 2 (self-play) Opponent =  ExpertAgent Opponent Agent OpponentAgent vs.  expert self-‑play1 self-‑play  2 Moving  Average RL  not  to  be  overtaken RL  to  overtake RL  not  to  be  overtaken
  • 15. Conclusion and Future Works • DAgger works  well  in  various  environments  such  as  TORCS. • DAgger is  very  effective  as  pre-‑training  before  RL. • Imitation  Learning  as  pre-‑training  cause  to  get  stuck  in  local  minima ? • Multi-‑task  learning  (predicting  existence  of  another  car  to  the  left or  right  at  the  same  time)  could  help  to  train  autonomous  driving  ? Future  Works Conclusion
  • 16. With  baselines Without  baselines With  pre-‑training Without  pre-‑training Appendix Comparison  of  baselines Comparison  of  pre-‑training  by  DAgger