SlideShare a Scribd company logo
第1回 最新のML,CV,NLP

関連論文読み会
2017/06/18

@aimpast
Introduce myself
名前: 蓑手智紀 (みのて ともき)
経歴: 都立高専 → 豊橋技術科学大学 学部卒 → ABEJA入社(2017年4月)
専攻: Computer Science
研究の一環で深層強化学習でレースゲームをプレイするエージェントを作ったりしました。
2
Recurrent Environment Simulators
Silvia Chiappa, Sébastien Racaniere, Daan Wierstra & Shakir Mohamed
DeepMind, London, UK
Reinforcement Learning
Which is the best action in current condition?
4
Hall
Hall Player Hall Goal
Hall Wall Wall Wall
Wall Wall
Wall Wall Wall Wall Wall
Reinforcement Learning
5
Agent
Environment
state reward action
st rt at
• The agent interacts the environment at every time-step
• The agent observes current state
Reinforcement Learning
6
Agent
Environment
state
st
action
at
st
• And it selects action at
Reinforcement Learning
7
Agent
Environment
state reward action
atst+1 rt+1
• The action is passed to environment and modifies its state
• The agent receives next state and rewardst+1 rt+1
• The goal of the agent is to interact with environment in a way
that maximizes future reward.
• the future discounted future return at time t is given by
• The optical strategy is to maximize the expectation.
Reinforcement Learning
8
V⇡ = rt + rt+1 + 2
rt+2 =
1X
k=0
k
rt+k
Q⇡
(s, a) = E⇡ {V⇡|st = s, at = a}
: discount factor < 1.0
Abstract
• This research topic is a part of Reinforcement Learning.
• They introduce the model can predict next observation from previous
state and action and predicted/observed state.
• Models that can simulate how environments change in response to
actions can be used by agents to plan and act efficiently.
9
Introduction
10
Oh et al., ’Action-Conditional Video Prediction using Deep Network in Atari Games’
: observed frame (pixel)
: predicted frame (pixel)
Introduction
11
Introduction
12
: State
: state transition function : encoding function
: predicted frame (pixel)
: observed frame (pixel)
: decoding function
: action
: return or
Action-Dependent State Transition
Introduction
13
: State
: state transition function : encoding function
: predicted frame (pixel)
: observed frame (pixel)
: decoding function
: action
: return or
Prediction-Dependent State Transition
LSTM Block
14
tanh
+
σ
+
+
σ +
tanh
σ+
output
recurrent
recurrent
recurrent
recurrent
recurrent
input
input
input
input
f
i
o
output gate
input gate
forget gate
at-1
at-1
at-1
at-1
Recurrent Environment Simulators
The selection of the predicted frame or real frame
15
: Prediction-dependent transition
: Observation-dependent transition
Experiment
• 0% PDT
• 33% PDT
• 0%-20%-33%PDT
• Only observation-dependent transitions in the first 10,000 parameter updates
• Prediction-dependent:Observation-dependent= 8:2 for the the subsequent
100,000 parameters updates
• PDT:ODT=1:2 after the subsequence 100,000 parameters updates
• 46% PDT Alt.: Alternate between ODT and ODT from a time-step to the next.
• 46% PDT
• 67% PDT
• 0%-100% PDT:
• Only observation-dependent transitions in the first 1000 parameter updates
• Only prediction-dependent transitions in the subsequent parameter updates.
• 100% PDT
16
Train
The model Trained to minimize the mean squared error (MSE)
error between the observed time-series and predicted time-
series.
17
Results
18
Results
19
Results
20
Results
21
Discussion
We showed state-of-the-art results on Atari, and demonstrated the feasibility of
live human play in all three task families. The system is able to capture
complex and long-term interactions, and displays a sense of spatial and
temporal coherence that has, to our knowledge, not been demonstrated on
high-dimensional time-series data such as these.
Complex environments have compositional structure, such as independently
moving objects and other phenomena that only rarely interact. In order for our
simulators to better capture this compositional structure, we may need to
develop specialised functional forms and memory stores that are better suited
to dealing with independent representations and their interlinked interactions
and relationships.
22

More Related Content

Similar to Recurrent_environment_simulators

Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
height
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
MLconf
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Universitat Politècnica de Catalunya
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learning
Ryo Iwaki
 
Price movement prediction in Hong Kong equity market
Price movement prediction in Hong Kong equity marketPrice movement prediction in Hong Kong equity market
Price movement prediction in Hong Kong equity market
Tc. Ying
 
Evolving Custom Communication Protocols
Evolving Custom Communication ProtocolsEvolving Custom Communication Protocols
Evolving Custom Communication Protocols
Wesley Faler
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
Jie-Han Chen
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
Giulio Carducci
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
Ben Ball
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
DongHyun Kwak
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
재연 윤
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
Natan Katz
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Data Con LA
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
YasutoTamura1
 
[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation
Seung Jae Lee
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
The Statistical and Applied Mathematical Sciences Institute
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
Edureka!
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
Ruth Yakubu
 
Reinforcement Learning with Amazon SageMaker RL
Reinforcement Learning with Amazon SageMaker RLReinforcement Learning with Amazon SageMaker RL
Reinforcement Learning with Amazon SageMaker RL
Thom Lane
 

Similar to Recurrent_environment_simulators (20)

Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learning
 
Price movement prediction in Hong Kong equity market
Price movement prediction in Hong Kong equity marketPrice movement prediction in Hong Kong equity market
Price movement prediction in Hong Kong equity market
 
Evolving Custom Communication Protocols
Evolving Custom Communication ProtocolsEvolving Custom Communication Protocols
Evolving Custom Communication Protocols
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
 
Reinforcement Learning with Amazon SageMaker RL
Reinforcement Learning with Amazon SageMaker RLReinforcement Learning with Amazon SageMaker RL
Reinforcement Learning with Amazon SageMaker RL
 

Recently uploaded

Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 

Recently uploaded (20)

Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 

Recurrent_environment_simulators

  • 2. Introduce myself 名前: 蓑手智紀 (みのて ともき) 経歴: 都立高専 → 豊橋技術科学大学 学部卒 → ABEJA入社(2017年4月) 専攻: Computer Science 研究の一環で深層強化学習でレースゲームをプレイするエージェントを作ったりしました。 2
  • 3. Recurrent Environment Simulators Silvia Chiappa, Sébastien Racaniere, Daan Wierstra & Shakir Mohamed DeepMind, London, UK
  • 4. Reinforcement Learning Which is the best action in current condition? 4 Hall Hall Player Hall Goal Hall Wall Wall Wall Wall Wall Wall Wall Wall Wall Wall
  • 5. Reinforcement Learning 5 Agent Environment state reward action st rt at • The agent interacts the environment at every time-step
  • 6. • The agent observes current state Reinforcement Learning 6 Agent Environment state st action at st • And it selects action at
  • 7. Reinforcement Learning 7 Agent Environment state reward action atst+1 rt+1 • The action is passed to environment and modifies its state • The agent receives next state and rewardst+1 rt+1
  • 8. • The goal of the agent is to interact with environment in a way that maximizes future reward. • the future discounted future return at time t is given by • The optical strategy is to maximize the expectation. Reinforcement Learning 8 V⇡ = rt + rt+1 + 2 rt+2 = 1X k=0 k rt+k Q⇡ (s, a) = E⇡ {V⇡|st = s, at = a} : discount factor < 1.0
  • 9. Abstract • This research topic is a part of Reinforcement Learning. • They introduce the model can predict next observation from previous state and action and predicted/observed state. • Models that can simulate how environments change in response to actions can be used by agents to plan and act efficiently. 9
  • 10. Introduction 10 Oh et al., ’Action-Conditional Video Prediction using Deep Network in Atari Games’ : observed frame (pixel) : predicted frame (pixel)
  • 12. Introduction 12 : State : state transition function : encoding function : predicted frame (pixel) : observed frame (pixel) : decoding function : action : return or Action-Dependent State Transition
  • 13. Introduction 13 : State : state transition function : encoding function : predicted frame (pixel) : observed frame (pixel) : decoding function : action : return or Prediction-Dependent State Transition
  • 15. Recurrent Environment Simulators The selection of the predicted frame or real frame 15 : Prediction-dependent transition : Observation-dependent transition
  • 16. Experiment • 0% PDT • 33% PDT • 0%-20%-33%PDT • Only observation-dependent transitions in the first 10,000 parameter updates • Prediction-dependent:Observation-dependent= 8:2 for the the subsequent 100,000 parameters updates • PDT:ODT=1:2 after the subsequence 100,000 parameters updates • 46% PDT Alt.: Alternate between ODT and ODT from a time-step to the next. • 46% PDT • 67% PDT • 0%-100% PDT: • Only observation-dependent transitions in the first 1000 parameter updates • Only prediction-dependent transitions in the subsequent parameter updates. • 100% PDT 16
  • 17. Train The model Trained to minimize the mean squared error (MSE) error between the observed time-series and predicted time- series. 17
  • 22. Discussion We showed state-of-the-art results on Atari, and demonstrated the feasibility of live human play in all three task families. The system is able to capture complex and long-term interactions, and displays a sense of spatial and temporal coherence that has, to our knowledge, not been demonstrated on high-dimensional time-series data such as these. Complex environments have compositional structure, such as independently moving objects and other phenomena that only rarely interact. In order for our simulators to better capture this compositional structure, we may need to develop specialised functional forms and memory stores that are better suited to dealing with independent representations and their interlinked interactions and relationships. 22