- 프로젝트명 : HomeNavi
- 발표 제목 : 3D Environment HOMENavi
- 발표자: 이의령 - RL Korea / 양홍선 - 고려대학교
- 내용 요약 : 3D 환경에서 강화학습 기반으로 네비게이션 방법에 대한 최신 연구 방향 및 비전에 대해 소개합니다. 기존 로봇 분야에서 SLAM 기반으로 네비게이션 방법과 달리 강화학습으로 접근했을 때 어떠한 장점과 단점이 있는지, 그리고 최근에 공개된 3D 강화학습 환경이 어떤 것들이 있는지 소개합니다. 그리고 베이스라인이 되는 논문들에 대한 간략한 설명과 함께 직접 실험을 통해 느낀 경험들을 공유하고자 합니다.
3. Project Introduction
3D Environment 기반 Home Navigation
• House(Indoor) 3D Dataset
• Reinforcement Learning Environment
• ‘Go to Kitchen’ 과 같은 Instruction 기반 Task 수행
3
9. Mobile Robot
9
A mobile robot is a robot that is capable of locomotion.
- wikipedia-
중분류 소분류 기술내용
Navigation
Driving
Path Planning
Obstacle Avoidance
Recognizing the surroundings
Localization
&
Mapping
Dead Reckoning
LandMark
SLAM
Credit : Machine Learning & Robotics / Geonhee Lee
10. Path Planning
10
• 현재 위치에서부터 지도상에 지정받은 목표 지점까지 이동 궤적(Trajectory)을 생
성
• Map상의 Global Path Planning과 Local Path Planning으로 나누어
로봇의 이동 경로를 생성
• Algorithm: A*, D*, RRT(Rapidly-exploring random tree), Probabilistic Roadmap 등
Credit : Machine Learning & Robotics / Geonhee Lee
11. SLAM
11
Simultaneous Localization and Mapping
• Computational problem of constructing a map of an environment
while simultaneously keeping track of a robot’s location
Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
13. SLAM
13
Mapping
• Scenarios in which a prior map is not available and needs to be built.
• Map can inform path planning or provide an intuitive visualization
for a human or robot.
Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
18. Vision - Language
Vision + Language Application
• Image Captioning
Input:
The man at bat
readies to swing at
the pitch while the
umpire looks on.
Desired
Output:
A large bus sitting
next to a very tall
building.
18
19. Vision - Language
Vision + Language Deep Learning Architecture
• Image Captioning
Credit : https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/
19
20. Vision - Language
20
Vision + Language Application
• Visual Question Answering(VQA)
Input:
Q: What is the
Musache made of?
Q: Is this a
Vegetarian Pizza?
Desired
Output:
A: Bananas A: No
21. Vision - Language
Vision + Language Deep Learning Architecture
• Visual Question Answering(VQA)
Credit : https://arxiv.org/pdf/1505.00468v6.pdf
21
22. Vision - Language Navigation
22
Evolution of Language and Vision datasets towards Actions
Credit : https://lvatutorial.github.io/
23. Vision - Language Navigation
23
Evolution of Language and Vision datasets towards Actions
24. Vision - Language Navigation
24
Evolution of Language and Vision datasets towards Actions
25. Vision - Language Navigation
25
Evolution of Language and Vision datasets towards Actions
26. Vision - Language Navigation
26
Evolution of Language and Vision datasets towards Actions
27. Vision - Language Navigation
27
Evolution of Language and Vision datasets towards Actions
28. Vision - Language Navigation
28
Evolution of Language and Vision datasets towards Actions
29. Vision - Language Navigation
29
Evolution of Language and Vision datasets towards Actions
30. Vision - Language Navigation
30
Evolution of Language and Vision datasets towards Actions
31. Vision - Language Navigation
31
Evolution of Language and Vision datasets towards Actions
32. Vision - Language Navigation
32
Evolution of Language and Vision datasets towards Actions
33. Vision - Language Navigation
33
Language
ActionsVision
• Image / video
understanding
• 3D environment
perception
• Camera motion
• Robotics /
Manipulation
• APIs
• Instruction following
• Question answering
• Dialog
‘Complete’
Agent
35. 3D Environment
35
X
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017) Stanford 2D-3D-S (Armeni et al., 2017)
Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
36. 3D Environment
36
X
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)
Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
37. 3D Environment
37Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
X
EmbodiedQA
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
Interactive QA
(Gordon et al., 2018)
Vision-Language Navigation
(Anderson et al., 2018)
Language grounding
(Chaplot et al., 2017,
Hermann & Hill et al., 2017)
Visual Navigation
(Zhu & Gordon et al., 2017,
Savva et al., 2017,
Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)
38. 3D Environment
38Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
X
EmbodiedQA
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
Interactive QA
(Gordon et al., 2018)
Vision-Language Navigation
(Anderson et al., 2018)
Language grounding
(Chaplot et al., 2017,
Hermann & Hill et al., 2017)
Visual Navigation
(Zhu & Gordon et al., 2017,
Savva et al., 2017,
Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)
>= 2017 (!)
39. Paper (in project)
39
House3D Environment 구축
RoomNav 학습 모델
House3D
Yi Wu et, al(2017)
Gated
Attention
Chaplot et, al(2017)
Gated Attention Module
House3D RoomNav의
레퍼런스 모델
Embodied QA
Abhishek et, al(2017)
최초 VQA + RL 접근
Embodied QA Dataset 구축
Hirarchical Model
PACMAN 학습 모델
CVPR 2018
FollowNet
P Shah et, al(2017)
Conditioned Attention 모형
Long Instruction(Language)
사용
ICRA 2018
Arxiv Link Arxiv Link Arxiv Link Arxiv Link
Code Code Code
40. Paper
40
Target Driven Visual
Navigation in Indoor Scene
Siamese 형태의
RL기반 Navigation 학습 모델
ICRA 2017
Target Driven
Visual Navi
Yuke Zhu et, al(2017)
CMP
Gupta et, al(2017)
Arxiv Link
Cognitive Mapping and
Planning for visual Navigation
Value Iteration Network
CVPR 2017
Arxiv Link
Visual Question Answering
in Interactive Environment
CVPR 2018
Arxiv Link
CodeCodeCode
Vision and Language
Navigation
CVPR 2018 spotlight
Arxiv Link
IQA
Gordon et, al(2018)
VLN
Anderson et, al(2018)