SlideShare a Scribd company logo
Kyonggi Univ. AI Lab.
STOCHASTIC LATENT ACTOR-CRITIC : DEEP REINFORCEMENT
LEARNING WITH A LATENT VARIABLE MODEL
2020.11.16
정규열
Artificial Intelligence Lab
Kyonggi Univiersity
Kyonggi Univ. AI Lab.
Index
 도입 배경
 SLAC (stochastic latent actor-critic)
 실험
 결론 및 의견
Kyonggi Univ. AI Lab.
도입 배경
Kyonggi Univ. AI Lab.
도입 배경
 고 차원 이미지로 학습 하는 것은 어려운 일이다.
 다음 두가지를 해결해야 한다.
 표현 학습(representation learning)
 행동 학습(task learning)
 SLAC을 제안함
 고차원의 이미지에서 latent representation 을 학습한다.
 VAE(변분적 오토 인코더)를 도입 하였다.
 latent representation으로 부터 강화학습을 진행한다.
 Soft Actor-Critic을 도입 하였다.
• 원 저자 코드 (tensorflow): https://github.com/alexlee-gk/slac
• Pytorch 코드 : https://github.com/ku2482/slac.pytorch
Kyonggi Univ. AI Lab.
SLAC (STOCHASTIC LATENT ACTOR-CRITIC)
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 학습 과정
1단계 : latent 학습(3H)
2단계 : latent 학습 및 강화학습 진행(20H)
• 행동을 임의대로 설정하여 행동과
이미지를 확보한다.
• 확보한 이미지로 latent를 학습한다.
• 학습된 latent를 이용하여 강화학습을
진행한다.
• 탐색을 장려하기 위한 Soft-Actor-Critic
을 이용한다.
2080TI로 학습 시 거의 24시간 소요되었음
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 1단계 : latent 학습을 우선 진행한다.
 일정 time-step 만큼 설정하여 데이터를 모은다.
 State, action등
 이 데이터들을 이용하여 VAE를 학습한다.
 학습 후 올바른 latent(z)를 얻을 수 있다.
state
실제로는 CNN을 사용함.
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 VAE (변분적 오토 인코더)
차원을 축소하여 알짜 정보(latent)를 추출함
Encoder Decoder
차원축소
변분적 추론 : latent 분포를 간단한 확률 분포로 근사 한다.
𝒑 𝒛 𝒙) ≈ 𝒑(𝒛)
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 2단계 : latent와 강화학습 진행한다.
 Soft actor-critic 도입함
Latent 학습
Critic 학습
Actor 학습
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 SAC (soft Actor-Critic)의 도입 목적
 Exploration 과 Exploitation간의 Trade Off를 해결 하고자 함.
 On-Policy에 대한 sample의 비효율성을 해결하고자 함.
Entropy RL
일반적 RL
Entropy
• 탐색을 더 진행하게 된다
• 보상이 많이 낮은 행동을 시도할 위험도 적어진
hyperparameter
• Entropy 반영 크기 조절
• 옵션 1 : 고정 값으로 사용
• 옵션 2 : 변동 값으로 사용
Entropy 값에 따라 조절 한다.
Kyonggi Univ. AI Lab.
실험
Kyonggi Univ. AI Lab.
실험
 실험 환경
cheetah walker ball-in-
cup catch
finger spin
half cheetah walker hopper ant
DeepMind Control
Open AI
Kyonggi Univ. AI Lab.
실험
 환경 예시 (cheetah)
Kyonggi Univ. AI Lab.
실험
 정량적 평가
 이미지로 학습하는 모델 들과의 비교(DeepMind Control)
전반적으로 제안한 SLAC의 성능이 좋은 편이다.
Kyonggi Univ. AI Lab.
실험
 정량적 평가
 이미지로 학습하는 모델 들과의 비교(Open AI)
전반적으로 제안한 SLAC의 성능이 좋은 편이다.
Kyonggi Univ. AI Lab.
실험
 정성적 평가 (cheetah)
Encoder Decoder
Ground Truth
Decoder로 부터 생성된 순서 이미지
Latent로 부터 생성된 순서 이미지
Encoder로 부터 생성된 순서 이미지
Kyonggi Univ. AI Lab.
실험
 자체 실험 결과 (cheetah)
 Latent
Decoder loss KL loss
고차원 이미지를 시간이 지날수록 잘 처리 하였다.
Kyonggi Univ. AI Lab.
실험
 자체 실험 결과 (cheetah)
 강화학습
Return α 값 entropy
• 성능은 논문과 비슷한 수준으로 나왔다
• Entropy 값에 따라 탐색의 정도가 달라졌다.
• 이에 맞춰 α값 또한 조절 되었다.
Kyonggi Univ. AI Lab.
결론 및 의견
Kyonggi Univ. AI Lab.
결론 및 의견
 논문의 결론
 고차원의 이미지로 부터 강화학습을 진행 하고자 함
 Latent를 이용하여 진행한다.
 VAE기반으로 변분적 추론을 한다.
 이후 Soft Actor-Critic을 통하여 강화학습을 진행한다.
 Exploration 과 Exploitation간의 Trade Off를 해결 할 수 있다.
 On-Policy에 대한 sample의 비효율성을 해결 할 수 있다.
Kyonggi Univ. AI Lab.
결론 및 의견
 개인적 의견
 이미지 기반의 학습일 경우
 복잡한 환경이면 Latent 자체 학습도 오래 소요 될 것으로 판단됨.
 Cheetah의 경우는 3시간 소요 되었다.
 이미지 투사 위치가 달라지면 재 학습 시켜야 한다.
 병렬적으로 학습 진행을 하는게 좋다고 판단됨.
 Soft Actor-Critic에서 α 관련(개인 경험적 사례)
 쉬운 Task는 고정 값을 사용해도 무방
 복잡 할 수록 변동 값을 사용하는 것이 좋을 듯 함.

More Related Content

What's hot

From embodied Artificial Intelligence to Artificial Life
From embodied Artificial Intelligence to Artificial LifeFrom embodied Artificial Intelligence to Artificial Life
From embodied Artificial Intelligence to Artificial Life
Krzysztof Pomorski
 
Practical Swarm Optimization (PSO)
Practical Swarm Optimization (PSO)Practical Swarm Optimization (PSO)
Practical Swarm Optimization (PSO)
khashayar Danesh Narooei
 
Positive-Unlabeled Learning with Non-Negative Risk Estimator
Positive-Unlabeled Learning with Non-Negative Risk EstimatorPositive-Unlabeled Learning with Non-Negative Risk Estimator
Positive-Unlabeled Learning with Non-Negative Risk Estimator
Kiryo Ryuichi
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!
taeseon ryu
 
ICML2021の連合学習の論文
ICML2021の連合学習の論文ICML2021の連合学習の論文
ICML2021の連合学習の論文
Katsuya Ito
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
홍배 김
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
Dmytro Mishkin
 
言葉のもつ広がりを、モデルの学習に活かそう -one-hot to distribution in language modeling-
言葉のもつ広がりを、モデルの学習に活かそう -one-hot to distribution in language modeling-言葉のもつ広がりを、モデルの学習に活かそう -one-hot to distribution in language modeling-
言葉のもつ広がりを、モデルの学習に活かそう -one-hot to distribution in language modeling-
Takahiro Kubo
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
Yusuke Uchida
 
Ant colony optimization (aco)
Ant colony optimization (aco)Ant colony optimization (aco)
Ant colony optimization (aco)
gidla vinay
 
0314 1 anova
0314 1 anova0314 1 anova
0314 1 anova
Jeonghun Yoon
 
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
Toru Tamaki
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiers
butest
 
AI in Traffic Prediction.pptx
AI in Traffic Prediction.pptxAI in Traffic Prediction.pptx
AI in Traffic Prediction.pptx
ShannykumarSingh
 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
anurag singh
 
パンハウスゼミ 異常検知論文紹介 20191005
パンハウスゼミ 異常検知論文紹介  20191005パンハウスゼミ 異常検知論文紹介  20191005
パンハウスゼミ 異常検知論文紹介 20191005
ぱんいち すみもと
 
Ant Colony Optimization - ACO
Ant Colony Optimization - ACOAnt Colony Optimization - ACO
Ant Colony Optimization - ACO
Mohamed Talaat
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0
taeseon ryu
 
相互情報量を用いた独立性の検定
相互情報量を用いた独立性の検定相互情報量を用いた独立性の検定
相互情報量を用いた独立性の検定
Joe Suzuki
 
SARCASM DETECTION: A COMPUTATIONAL AND COGNITIVE STUDY
SARCASM DETECTION: A COMPUTATIONAL AND COGNITIVE STUDYSARCASM DETECTION: A COMPUTATIONAL AND COGNITIVE STUDY
SARCASM DETECTION: A COMPUTATIONAL AND COGNITIVE STUDY
Artificial Intelligence Institute at UofSC
 

What's hot (20)

From embodied Artificial Intelligence to Artificial Life
From embodied Artificial Intelligence to Artificial LifeFrom embodied Artificial Intelligence to Artificial Life
From embodied Artificial Intelligence to Artificial Life
 
Practical Swarm Optimization (PSO)
Practical Swarm Optimization (PSO)Practical Swarm Optimization (PSO)
Practical Swarm Optimization (PSO)
 
Positive-Unlabeled Learning with Non-Negative Risk Estimator
Positive-Unlabeled Learning with Non-Negative Risk EstimatorPositive-Unlabeled Learning with Non-Negative Risk Estimator
Positive-Unlabeled Learning with Non-Negative Risk Estimator
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!
 
ICML2021の連合学習の論文
ICML2021の連合学習の論文ICML2021の連合学習の論文
ICML2021の連合学習の論文
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
言葉のもつ広がりを、モデルの学習に活かそう -one-hot to distribution in language modeling-
言葉のもつ広がりを、モデルの学習に活かそう -one-hot to distribution in language modeling-言葉のもつ広がりを、モデルの学習に活かそう -one-hot to distribution in language modeling-
言葉のもつ広がりを、モデルの学習に活かそう -one-hot to distribution in language modeling-
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
Ant colony optimization (aco)
Ant colony optimization (aco)Ant colony optimization (aco)
Ant colony optimization (aco)
 
0314 1 anova
0314 1 anova0314 1 anova
0314 1 anova
 
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiers
 
AI in Traffic Prediction.pptx
AI in Traffic Prediction.pptxAI in Traffic Prediction.pptx
AI in Traffic Prediction.pptx
 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
 
パンハウスゼミ 異常検知論文紹介 20191005
パンハウスゼミ 異常検知論文紹介  20191005パンハウスゼミ 異常検知論文紹介  20191005
パンハウスゼミ 異常検知論文紹介 20191005
 
Ant Colony Optimization - ACO
Ant Colony Optimization - ACOAnt Colony Optimization - ACO
Ant Colony Optimization - ACO
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0
 
相互情報量を用いた独立性の検定
相互情報量を用いた独立性の検定相互情報量を用いた独立性の検定
相互情報量を用いた独立性の検定
 
SARCASM DETECTION: A COMPUTATIONAL AND COGNITIVE STUDY
SARCASM DETECTION: A COMPUTATIONAL AND COGNITIVE STUDYSARCASM DETECTION: A COMPUTATIONAL AND COGNITIVE STUDY
SARCASM DETECTION: A COMPUTATIONAL AND COGNITIVE STUDY
 

Similar to Stochastic latent actor critic - deep reinforcement learning with a latent variable model

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
KyuYeolJung
 
Style gan
Style ganStyle gan
Style gan
KyuYeolJung
 
MARL based on role
MARL based on roleMARL based on role
MARL based on role
KyuYeolJung
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Sangmin Woo
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
Junho Cho
 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slides
wolf
 
Prospector Osq 2004 Final
Prospector Osq 2004 FinalProspector Osq 2004 Final
Prospector Osq 2004 Final
kurniawan.kuga
 
Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...
ISSEL
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
ISSEL
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
ISSEL
 
plug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generationplug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generation
KyuYeolJung
 
alexVAE_New.pdf
alexVAE_New.pdfalexVAE_New.pdf
alexVAE_New.pdf
sourabhgothe1
 
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Kieran Alden
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Lviv Startup Club
 
Object detection
Object detectionObject detection
Object detection
Somesh Vyas
 
Cpgan content-parsing generative
Cpgan   content-parsing generativeCpgan   content-parsing generative
Cpgan content-parsing generative
KyuYeolJung
 
Continual Reinforcement Learning in 3D Non-stationary Environments
Continual Reinforcement Learning in 3D Non-stationary EnvironmentsContinual Reinforcement Learning in 3D Non-stationary Environments
Continual Reinforcement Learning in 3D Non-stationary Environments
Vincenzo Lomonaco
 
Software Engineering for Robotics - The RoboStar Technology
Software Engineering for Robotics - The RoboStar TechnologySoftware Engineering for Robotics - The RoboStar Technology
Software Engineering for Robotics - The RoboStar Technology
AdaCore
 
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
Liang Gong
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII
 

Similar to Stochastic latent actor critic - deep reinforcement learning with a latent variable model (20)

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
 
Style gan
Style ganStyle gan
Style gan
 
MARL based on role
MARL based on roleMARL based on role
MARL based on role
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slides
 
Prospector Osq 2004 Final
Prospector Osq 2004 FinalProspector Osq 2004 Final
Prospector Osq 2004 Final
 
Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
 
plug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generationplug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generation
 
alexVAE_New.pdf
alexVAE_New.pdfalexVAE_New.pdf
alexVAE_New.pdf
 
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
 
Object detection
Object detectionObject detection
Object detection
 
Cpgan content-parsing generative
Cpgan   content-parsing generativeCpgan   content-parsing generative
Cpgan content-parsing generative
 
Continual Reinforcement Learning in 3D Non-stationary Environments
Continual Reinforcement Learning in 3D Non-stationary EnvironmentsContinual Reinforcement Learning in 3D Non-stationary Environments
Continual Reinforcement Learning in 3D Non-stationary Environments
 
Software Engineering for Robotics - The RoboStar Technology
Software Engineering for Robotics - The RoboStar TechnologySoftware Engineering for Robotics - The RoboStar Technology
Software Engineering for Robotics - The RoboStar Technology
 
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 

Recently uploaded

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 

Recently uploaded (20)

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 

Stochastic latent actor critic - deep reinforcement learning with a latent variable model

  • 1. Kyonggi Univ. AI Lab. STOCHASTIC LATENT ACTOR-CRITIC : DEEP REINFORCEMENT LEARNING WITH A LATENT VARIABLE MODEL 2020.11.16 정규열 Artificial Intelligence Lab Kyonggi Univiersity
  • 2. Kyonggi Univ. AI Lab. Index  도입 배경  SLAC (stochastic latent actor-critic)  실험  결론 및 의견
  • 3. Kyonggi Univ. AI Lab. 도입 배경
  • 4. Kyonggi Univ. AI Lab. 도입 배경  고 차원 이미지로 학습 하는 것은 어려운 일이다.  다음 두가지를 해결해야 한다.  표현 학습(representation learning)  행동 학습(task learning)  SLAC을 제안함  고차원의 이미지에서 latent representation 을 학습한다.  VAE(변분적 오토 인코더)를 도입 하였다.  latent representation으로 부터 강화학습을 진행한다.  Soft Actor-Critic을 도입 하였다. • 원 저자 코드 (tensorflow): https://github.com/alexlee-gk/slac • Pytorch 코드 : https://github.com/ku2482/slac.pytorch
  • 5. Kyonggi Univ. AI Lab. SLAC (STOCHASTIC LATENT ACTOR-CRITIC)
  • 6. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  학습 과정 1단계 : latent 학습(3H) 2단계 : latent 학습 및 강화학습 진행(20H) • 행동을 임의대로 설정하여 행동과 이미지를 확보한다. • 확보한 이미지로 latent를 학습한다. • 학습된 latent를 이용하여 강화학습을 진행한다. • 탐색을 장려하기 위한 Soft-Actor-Critic 을 이용한다. 2080TI로 학습 시 거의 24시간 소요되었음
  • 7. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  1단계 : latent 학습을 우선 진행한다.  일정 time-step 만큼 설정하여 데이터를 모은다.  State, action등  이 데이터들을 이용하여 VAE를 학습한다.  학습 후 올바른 latent(z)를 얻을 수 있다. state 실제로는 CNN을 사용함.
  • 8. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  VAE (변분적 오토 인코더) 차원을 축소하여 알짜 정보(latent)를 추출함 Encoder Decoder 차원축소 변분적 추론 : latent 분포를 간단한 확률 분포로 근사 한다. 𝒑 𝒛 𝒙) ≈ 𝒑(𝒛)
  • 9. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  2단계 : latent와 강화학습 진행한다.  Soft actor-critic 도입함 Latent 학습 Critic 학습 Actor 학습
  • 10. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  SAC (soft Actor-Critic)의 도입 목적  Exploration 과 Exploitation간의 Trade Off를 해결 하고자 함.  On-Policy에 대한 sample의 비효율성을 해결하고자 함. Entropy RL 일반적 RL Entropy • 탐색을 더 진행하게 된다 • 보상이 많이 낮은 행동을 시도할 위험도 적어진 hyperparameter • Entropy 반영 크기 조절 • 옵션 1 : 고정 값으로 사용 • 옵션 2 : 변동 값으로 사용 Entropy 값에 따라 조절 한다.
  • 11. Kyonggi Univ. AI Lab. 실험
  • 12. Kyonggi Univ. AI Lab. 실험  실험 환경 cheetah walker ball-in- cup catch finger spin half cheetah walker hopper ant DeepMind Control Open AI
  • 13. Kyonggi Univ. AI Lab. 실험  환경 예시 (cheetah)
  • 14. Kyonggi Univ. AI Lab. 실험  정량적 평가  이미지로 학습하는 모델 들과의 비교(DeepMind Control) 전반적으로 제안한 SLAC의 성능이 좋은 편이다.
  • 15. Kyonggi Univ. AI Lab. 실험  정량적 평가  이미지로 학습하는 모델 들과의 비교(Open AI) 전반적으로 제안한 SLAC의 성능이 좋은 편이다.
  • 16. Kyonggi Univ. AI Lab. 실험  정성적 평가 (cheetah) Encoder Decoder Ground Truth Decoder로 부터 생성된 순서 이미지 Latent로 부터 생성된 순서 이미지 Encoder로 부터 생성된 순서 이미지
  • 17. Kyonggi Univ. AI Lab. 실험  자체 실험 결과 (cheetah)  Latent Decoder loss KL loss 고차원 이미지를 시간이 지날수록 잘 처리 하였다.
  • 18. Kyonggi Univ. AI Lab. 실험  자체 실험 결과 (cheetah)  강화학습 Return α 값 entropy • 성능은 논문과 비슷한 수준으로 나왔다 • Entropy 값에 따라 탐색의 정도가 달라졌다. • 이에 맞춰 α값 또한 조절 되었다.
  • 19. Kyonggi Univ. AI Lab. 결론 및 의견
  • 20. Kyonggi Univ. AI Lab. 결론 및 의견  논문의 결론  고차원의 이미지로 부터 강화학습을 진행 하고자 함  Latent를 이용하여 진행한다.  VAE기반으로 변분적 추론을 한다.  이후 Soft Actor-Critic을 통하여 강화학습을 진행한다.  Exploration 과 Exploitation간의 Trade Off를 해결 할 수 있다.  On-Policy에 대한 sample의 비효율성을 해결 할 수 있다.
  • 21. Kyonggi Univ. AI Lab. 결론 및 의견  개인적 의견  이미지 기반의 학습일 경우  복잡한 환경이면 Latent 자체 학습도 오래 소요 될 것으로 판단됨.  Cheetah의 경우는 3시간 소요 되었다.  이미지 투사 위치가 달라지면 재 학습 시켜야 한다.  병렬적으로 학습 진행을 하는게 좋다고 판단됨.  Soft Actor-Critic에서 α 관련(개인 경험적 사례)  쉬운 Task는 고정 값을 사용해도 무방  복잡 할 수록 변동 값을 사용하는 것이 좋을 듯 함.