SlideShare a Scribd company logo
1 of 26
Download to read offline
Machine
Learning
LABoratory
Seungjoon Lee. 2023-09-06. sjlee1218@postech.ac.kr
Learning to Model the World
with Language
Paper Summary
1
Contents
• Introduction
• Methods
• Experiments
2
Caution!!!
• This is the material I summarized a paper at my personal research meeting.
• Some of the contents may be incorrect!
• Some contributions, experiments are excluded intentionally, because they
are not directly related to my research interest.
• Methods are simpli
fi
ed for easy explanation.
• Please send me an email if you want to contact me: sjlee1218@postech.ac.kr
(for correction or addition of materials, ideas to develop this paper, or others).
3
Situations
• Most language-conditioned RL methods only use language as instructions
(eg. “Pick the blue box”)
• However, language does not always match the optimal action.
• Therefore, mapping language only to actions is a weak learning signal.
4
“Put the bowls away”
Complication
• On the other hand, human can predict the future using language.
• Human can predict environment dynamics (eg. “wrenches tightens nuts.”)
• Human can predict the future observations (eg. “the paper is outside.”)
5
Questions & Hypothesis
• Question:
• If we let reinforcement learning predict the future using language, will its
performance improve?
• Hypothesis:
• Predicting the future representation provides a rich learning signal for
agents of how language relates to the world.
• Rich learning signal: frequent, stable training signal.
6
Contributions
• DynaLang enables RL agents to use diverse types of language, for example
hint or dynamics, along with instruction.
• DynaLang suggests the future prediction self-supervised objective to improve
the training performance.
7
Why is This New?
• Previous language-based RL methods either used language as only
instructions or only description of environment.
• DynaLang uni
fi
es these settings so that agents learns from diverse types of
language.
• Previous works mostly directly condition policies on language to generate
actions.
• DynaLang proposes the future prediction objective to train the world model
which associates language, image, and dynamics.
8
Methods
9
Problem Setting
• Observation: , where is an image, is a language token.
• An agent chooses action , then environment returns:
• reward ,
• a
fl
ag whether the episode continues ,
• and next observation .
•
The agent’s goal is to maximize
ot = (xt, lt) xt lt
at
rt+1
ct+1
ot+1
E
[
T
∑
t=1
γt−1
rt
]
10
Method Outline
• DynaLang components
• World model: encodes current image obs and language into representation.
• RL agent: using encoded representation, acts to maximize the sum of
discounted reward.
11
Method - World Model
Outline
• World model components:
• Encoder - Decoder: learns to represent the current state.
• Sequence model: learns to predict the future state representation.
12
Method - World Model
Base model (previous work)
• DynaLang = Dreamer V3 + language + future prediction objective.
• Dreamer V3 learns to compute compact representations of current state, and
learns how these concepts change by actions.
13
Architecture of Dreamer V3
Method - World Model
Incorporation of language
• DynaLang incorporates language into the encoder-decoder of Dremer V3.
• By this, DynaLang gets representations unifying visual observations and
languages.
14
Method - World Model
Prediction of the future
• DynaLang adds the future representation prediction into the sequence model
of Dreamer V3.
• Future representation prediction lets the agent extract the information from
language, relating to the dynamics of multiple modalities.
15
Method - World Model
Model Losses
• World model loss: , where
• Image loss
• Language loss
• Reward loss
• Continue loss
• Regularizer , where sg is stop-gradient
• Future prediction loss
Lx + Ll + Lr + Lc + Lreg + Lpred
Lx = || ̂
xt − x||2
2
Ll = categorical_cross_entropy( ̂
lt, lt)
Lr = ( ̂
rt − rt)2
Lc = binary_cross_entropy( ̂
ct, ct)
Lreg = βreg max(1,KL[zt |sg( ̂
zt)])
Lpred = βpred max(1,KL[sg(zt), ̂
zt])
16
Method - RL Agent
Outline
• The used RL agent is a simple actor critic agent.
• Actor:
• Critic:
• Note that the RL agent is not conditioned on language directly.
π(at |zt, ht)
V(ht, zt)
17
Method - RL Agent
Environment interaction
• The RL agent interacts with environment using the encoded representation
and history .
zt
ht
18
Method - RL Agent
Training
• Let , the estimated discounted sum of
future rewards.
• Critic loss:
• Actor loss: , maximizing the return estimate
• The agent is trained only using imagined rollout generated by the world model.
• The agent is trained by the action of the agent and the predicted states, rewards.
Rt = rt + γct ((1 − λ)V (zt+1, ht+1) + λRt+1)
Lϕ = (Vϕ(zt, ht) − Rt)
2
Lθ = − (Rt − V(zt, ht)) log πθ(at |ht, zt)
19
Experiments
20
Experiments 1 - Diverse Types of Language
Questions
• Questions to address:
• Can DynaLang use diverse types of language along with instruction?
• If can, does it improve task performance?
21
Experiments 1 - Diverse Types of Language
Setup
• Env: HomeGrid
• multitask grid world where agents receive task
instruction in language but also language hints.
• Agents gets a reward of 1 when a task is completed,
and then a new task is sampled.
• Therefore, agents must complete as many tasks
as possible before the episode terminates in 100
steps.
•
22
HomeGrid env. Agents receive 3 typess of hints.
Experiments 1 - Diverse Types of Language
Results
• Baselines: model-free o
ff
-policy algorithms, IMPALA, R2D2.
• Simply image embeddings, language embeddings are conditioned to policy.
• DynaLang solves more tasks with hints, but simple language-conditioned RL
get worse with hints.
23
HomeGrid training performance after 50M steps (2 seeds)
Experiments 2 - Future Prediction
Questions
• Questions to address:
• Is adding future prediction more e
ff
ective than using language to only
generate actions?
24
Experiments 2 - Future Prediction
Setup
• Env: Messenger
• grid world where agents should deliver a message
while avoiding enemies using text manuals.
• Agents must understand manuals and relate them to
the environment to achieve high score.
25
Messenger env. Agent get text manuals.
Experiments 2 - Future Prediction
Results
• EMMA is added to be compared:
• Language + gridworld speci
fi
c method, using language only to generate action.
• Only DynaLang can learn from S3, the most di
ffi
cult setting.
• Adding future prediction helps the training more than only action generation.
• However, the authors do not include ablation studies which exclude the future
prediction loss from their architecture.
26
Messenger training performance (2 seeds). S1 is most easy, S3 is most hard.

More Related Content

Similar to 230906 paper summary - learning to world model with language - public.pdf

Neural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptxNeural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptxssuser2624f71
 
Cse115 lecture02overviewofprogramming
Cse115 lecture02overviewofprogrammingCse115 lecture02overviewofprogramming
Cse115 lecture02overviewofprogrammingMd. Ashikur Rahman
 
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptBig Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptIgalia
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
problem solving skills and its related all information
problem solving skills and its related all informationproblem solving skills and its related all information
problem solving skills and its related all informationssuserf86fba
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonTakeshi Akutsu
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of pythonYung-Yu Chen
 
CS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsCS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsKrishnan MuthuManickam
 
The Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with androidThe Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with androidStanojko Markovik
 
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법NAVER D2
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...MOVING Project
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learningKiran Lonikar
 

Similar to 230906 paper summary - learning to world model with language - public.pdf (20)

Neural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptxNeural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptx
 
Cse115 lecture02overviewofprogramming
Cse115 lecture02overviewofprogrammingCse115 lecture02overviewofprogramming
Cse115 lecture02overviewofprogramming
 
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptBig Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
problem solving skills and its related all information
problem solving skills and its related all informationproblem solving skills and its related all information
problem solving skills and its related all information
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of Python
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
CS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsCS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of Algorithms
 
The Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with androidThe Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with android
 
Python4HPC.pptx
Python4HPC.pptxPython4HPC.pptx
Python4HPC.pptx
 
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learning
 

Recently uploaded

zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 

Recently uploaded (20)

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 

230906 paper summary - learning to world model with language - public.pdf

  • 1. Machine Learning LABoratory Seungjoon Lee. 2023-09-06. sjlee1218@postech.ac.kr Learning to Model the World with Language Paper Summary 1
  • 3. Caution!!! • This is the material I summarized a paper at my personal research meeting. • Some of the contents may be incorrect! • Some contributions, experiments are excluded intentionally, because they are not directly related to my research interest. • Methods are simpli fi ed for easy explanation. • Please send me an email if you want to contact me: sjlee1218@postech.ac.kr (for correction or addition of materials, ideas to develop this paper, or others). 3
  • 4. Situations • Most language-conditioned RL methods only use language as instructions (eg. “Pick the blue box”) • However, language does not always match the optimal action. • Therefore, mapping language only to actions is a weak learning signal. 4 “Put the bowls away”
  • 5. Complication • On the other hand, human can predict the future using language. • Human can predict environment dynamics (eg. “wrenches tightens nuts.”) • Human can predict the future observations (eg. “the paper is outside.”) 5
  • 6. Questions & Hypothesis • Question: • If we let reinforcement learning predict the future using language, will its performance improve? • Hypothesis: • Predicting the future representation provides a rich learning signal for agents of how language relates to the world. • Rich learning signal: frequent, stable training signal. 6
  • 7. Contributions • DynaLang enables RL agents to use diverse types of language, for example hint or dynamics, along with instruction. • DynaLang suggests the future prediction self-supervised objective to improve the training performance. 7
  • 8. Why is This New? • Previous language-based RL methods either used language as only instructions or only description of environment. • DynaLang uni fi es these settings so that agents learns from diverse types of language. • Previous works mostly directly condition policies on language to generate actions. • DynaLang proposes the future prediction objective to train the world model which associates language, image, and dynamics. 8
  • 10. Problem Setting • Observation: , where is an image, is a language token. • An agent chooses action , then environment returns: • reward , • a fl ag whether the episode continues , • and next observation . • The agent’s goal is to maximize ot = (xt, lt) xt lt at rt+1 ct+1 ot+1 E [ T ∑ t=1 γt−1 rt ] 10
  • 11. Method Outline • DynaLang components • World model: encodes current image obs and language into representation. • RL agent: using encoded representation, acts to maximize the sum of discounted reward. 11
  • 12. Method - World Model Outline • World model components: • Encoder - Decoder: learns to represent the current state. • Sequence model: learns to predict the future state representation. 12
  • 13. Method - World Model Base model (previous work) • DynaLang = Dreamer V3 + language + future prediction objective. • Dreamer V3 learns to compute compact representations of current state, and learns how these concepts change by actions. 13 Architecture of Dreamer V3
  • 14. Method - World Model Incorporation of language • DynaLang incorporates language into the encoder-decoder of Dremer V3. • By this, DynaLang gets representations unifying visual observations and languages. 14
  • 15. Method - World Model Prediction of the future • DynaLang adds the future representation prediction into the sequence model of Dreamer V3. • Future representation prediction lets the agent extract the information from language, relating to the dynamics of multiple modalities. 15
  • 16. Method - World Model Model Losses • World model loss: , where • Image loss • Language loss • Reward loss • Continue loss • Regularizer , where sg is stop-gradient • Future prediction loss Lx + Ll + Lr + Lc + Lreg + Lpred Lx = || ̂ xt − x||2 2 Ll = categorical_cross_entropy( ̂ lt, lt) Lr = ( ̂ rt − rt)2 Lc = binary_cross_entropy( ̂ ct, ct) Lreg = βreg max(1,KL[zt |sg( ̂ zt)]) Lpred = βpred max(1,KL[sg(zt), ̂ zt]) 16
  • 17. Method - RL Agent Outline • The used RL agent is a simple actor critic agent. • Actor: • Critic: • Note that the RL agent is not conditioned on language directly. π(at |zt, ht) V(ht, zt) 17
  • 18. Method - RL Agent Environment interaction • The RL agent interacts with environment using the encoded representation and history . zt ht 18
  • 19. Method - RL Agent Training • Let , the estimated discounted sum of future rewards. • Critic loss: • Actor loss: , maximizing the return estimate • The agent is trained only using imagined rollout generated by the world model. • The agent is trained by the action of the agent and the predicted states, rewards. Rt = rt + γct ((1 − λ)V (zt+1, ht+1) + λRt+1) Lϕ = (Vϕ(zt, ht) − Rt) 2 Lθ = − (Rt − V(zt, ht)) log πθ(at |ht, zt) 19
  • 21. Experiments 1 - Diverse Types of Language Questions • Questions to address: • Can DynaLang use diverse types of language along with instruction? • If can, does it improve task performance? 21
  • 22. Experiments 1 - Diverse Types of Language Setup • Env: HomeGrid • multitask grid world where agents receive task instruction in language but also language hints. • Agents gets a reward of 1 when a task is completed, and then a new task is sampled. • Therefore, agents must complete as many tasks as possible before the episode terminates in 100 steps. • 22 HomeGrid env. Agents receive 3 typess of hints.
  • 23. Experiments 1 - Diverse Types of Language Results • Baselines: model-free o ff -policy algorithms, IMPALA, R2D2. • Simply image embeddings, language embeddings are conditioned to policy. • DynaLang solves more tasks with hints, but simple language-conditioned RL get worse with hints. 23 HomeGrid training performance after 50M steps (2 seeds)
  • 24. Experiments 2 - Future Prediction Questions • Questions to address: • Is adding future prediction more e ff ective than using language to only generate actions? 24
  • 25. Experiments 2 - Future Prediction Setup • Env: Messenger • grid world where agents should deliver a message while avoiding enemies using text manuals. • Agents must understand manuals and relate them to the environment to achieve high score. 25 Messenger env. Agent get text manuals.
  • 26. Experiments 2 - Future Prediction Results • EMMA is added to be compared: • Language + gridworld speci fi c method, using language only to generate action. • Only DynaLang can learn from S3, the most di ffi cult setting. • Adding future prediction helps the training more than only action generation. • However, the authors do not include ablation studies which exclude the future prediction loss from their architecture. 26 Messenger training performance (2 seeds). S1 is most easy, S3 is most hard.