SlideShare a Scribd company logo
Look, Listen and Act
Written by 이의령, 김예찬, 양홍선
보고 듣고 행동하는 에이전트
3D Environment HomeNavi
1
Your
Logo
Here
Project Introduction
2
Project Introduction
3D Environment 기반 Home Navigation
• House(Indoor) 3D Dataset
• Reinforcement Learning Environment
• ‘Go to Kitchen’ 과 같은 Instruction 기반 Task 수행
3
Project Introduction – House3D
4
• Camera motion
• Robotics /
Manipulation
• APIs
Project Introduction
Language
ActionsVision
• Image / video
understanding
• 3D environment
perception
• Instruction following
• Question answering
• Dialog
‘Complete’
Agent
5
Motivation
Target-driven Visual Navigation Model using Deep Reinforcement Learning
Y Zhu , ICRA 2017
6
Motivation
7
모바일 로봇에 적용할 수 있지 않을까?
Your
Logo
Here
Mobile Robot & Navigation
8
Mobile Robot
9
A mobile robot is a robot that is capable of locomotion.
- wikipedia-
중분류 소분류 기술내용
Navigation
Driving
Path Planning
Obstacle Avoidance
Recognizing the surroundings
Localization
&
Mapping
Dead Reckoning
LandMark
SLAM
Credit : Machine Learning & Robotics / Geonhee Lee
Path Planning
10
• 현재 위치에서부터 지도상에 지정받은 목표 지점까지 이동 궤적(Trajectory)을 생성
• Map상의 Global Path Planning과 Local Path Planning으로 나누어
로봇의 이동 경로를 생성
• Algorithm: A*, D*, RRT(Rapidly-exploring random tree), Probabilistic Roadmap 등
Credit : Machine Learning & Robotics / Geonhee Lee
SLAM
11
Simultaneous Localization and Mapping
• Computational problem of constructing a map of an environment
while simultaneously keeping track of a robot’s location
Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
SLAM
12
Visual Localization
• Under the inaccurate GPS
• GPS-denied environment
Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
SLAM
13
Mapping
• Scenarios in which a prior map is not available and needs to be built.
• Map can inform path planning or provide an intuitive visualization
for a human or robot.
Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
Your
Logo
Here
Navigation
via Reinforcement Learning
14
RL based Navigation
15
Reinforcement Learning with Auxiliary Task (Deepmind 2016)
Credit : https://deepmind.com/blog/reinforcement-learning-unsupervised-auxiliary-tasks/
RL based Navigation
16www.website.com
Model Architecture
Credit : https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/
Your
Logo
Here
Vision – Language – Navigation
Deep RL based Navigation
17
Vision - Language
Vision + Language Application
• Image Captioning
Input:
The man at bat
readies to swing at
the pitch while the
umpire looks on.
Desired
Output:
A large bus sitting
next to a very tall
building.
18
Vision - Language
Vision + Language Deep Learning Architecture
• Image Captioning
Credit : https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/
19
Vision - Language
20
Vision + Language Application
• Visual Question Answering(VQA)
Input:
Q: What is the
Musache made of?
Q: Is this a
Vegetarian Pizza?
Desired
Output:
A: Bananas A: No
Vision - Language
Vision + Language Deep Learning Architecture
• Visual Question Answering(VQA)
Credit : https://arxiv.org/pdf/1505.00468v6.pdf
21
Vision - Language Navigation
22
Evolution of Language and Vision datasets towards Actions
Credit : https://lvatutorial.github.io/
Vision - Language Navigation
23
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
24
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
25
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
26
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
27
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
28
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
29
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
30
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
31
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
32
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
33
Language
ActionsVision
• Image / video
understanding
• 3D environment
perception
• Camera motion
• Robotics /
Manipulation
• APIs
• Instruction following
• Question answering
• Dialog
‘Complete’
Agent
3D Environment
34
X
Datasets
Environments
Tasks & Metrics
Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
3D Environment
35
X
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017) Stanford 2D-3D-S (Armeni et al., 2017)
Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
3D Environment
36
X
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)
Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
3D Environment
37Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
X
EmbodiedQA
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
Interactive QA
(Gordon et al., 2018)
Vision-Language Navigation
(Anderson et al., 2018)
Language grounding
(Chaplot et al., 2017,
Hermann & Hill et al., 2017)
Visual Navigation
(Zhu & Gordon et al., 2017,
Savva et al., 2017,
Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)
3D Environment
38Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
X
EmbodiedQA
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
Interactive QA
(Gordon et al., 2018)
Vision-Language Navigation
(Anderson et al., 2018)
Language grounding
(Chaplot et al., 2017,
Hermann & Hill et al., 2017)
Visual Navigation
(Zhu & Gordon et al., 2017,
Savva et al., 2017,
Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)
>= 2017 (!)
Paper (in project)
39
⚫ House3D Environment 구축
⚫ RoomNav 학습 모델
House3D
Yi Wu et, al(2017)
Gated
Attention
Chaplot et, al(2017)
⚫ Gated Attention Module
⚫ House3D RoomNav의
레퍼런스 모델
Embodied QA
Abhishek et, al(2017)
⚫ 최초 VQA + RL 접근
⚫ Embodied QA Dataset 구축
⚫ Hirarchical Model
⚫ PACMAN 학습 모델
⚫ CVPR 2018
FollowNet
P Shah et, al(2017)
⚫ Conditioned Attention 모형
⚫ Long Instruction(Language)
사용
⚫ ICRA 2018
Arxiv Link Arxiv Link Arxiv Link Arxiv Link
Code Code Code
Paper
40
⚫ Target Driven Visual
Navigation in Indoor Scene
⚫ Siamese 형태의
RL기반 Navigation 학습 모델
⚫ ICRA 2017
Target Driven
Visual Navi
Yuke Zhu et, al(2017)
CMP
Gupta et, al(2017)
Arxiv Link
⚫ Cognitive Mapping and
Planning for visual Navigation
⚫ Value Iteration Network
⚫ CVPR 2017
Arxiv Link
⚫ Visual Question Answering
in Interactive Environment
⚫ CVPR 2018
Arxiv Link
CodeCodeCode
⚫ Vision and Language
Navigation
⚫ CVPR 2018 spotlight
Arxiv Link
IQA
Gordon et, al(2018)
VLN
Anderson et, al(2018)
Paper
41
Vision Language Navigation 이란 제목으로
2017년부터 지속적으로
Paper & Environment이 나오고 있는 추세
Your
Logo
Here
Project Experiment
42
Dataset
+
Environment
+
Task
43
Dataset: SUNCG
44
Dataset: SUNCG
Bedroom Toilet,
Bathroom
Garage
Bedroom BedroomRoom type
45
Dataset: SUNCG
사람이 디자인한 45,622 개의 3D 씬
평균적으로 8.9개의 방과 1.3층으로 구성됨
20개의 방 종류 (bedroom, living room, …)
80개의 개체 유형 (cup, chair, …)
46
Environment: House3D
47
Tasks: RoomNav, Embodied QA
RoomNav
48
Tasks: RoomNav, Embodied QA
Embodied QA
49
RoomNav
50
Models
51
Models
1
2
52
53
Gated LSTM
Gated LSTM
(Vizdoom)
54
Gated LSTM
55
56
Look / Listen /Act
57
Look / Listen /Act
보고
58
Look / Listen /Act
보고
듣고
59
Look / Listen /Act
보고
듣고
행동
60
61
62
63
64
65
66
67
House3DVizdoom
68
House3DVizdoom 10% ?
69
Experimental Results of
RoomNav Paper
70
Return to Vizdoom !
71
Difficulty Level
72
73
30시간
74
More time!
75
Easy에서 학습된 모델을 → Hard에 적용
76
Visual input과 instruction과의 관계는 알고 있으나..
77
학습
78
Exploration을 하기 시작
79
Navigation + House3D = Ultra-hard
80
House3D에
단계적으로 학습시켜 보기
81
82
83
Questions?
84

More Related Content

Similar to 3D Environment : HomeNavigation

Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]
이 의령
 
Information session
Information sessionInformation session
Information session
SheilaJimenezMorejon
 
DSC NTUE Info Session
DSC NTUE Info SessionDSC NTUE Info Session
DSC NTUE Info Session
ssusera8eac9
 
Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...
HRITIKKHURANA1
 
Deck 8983a1d9-68df-4447-8481-3b4fd0de734c-128-133-443 (1)
Deck 8983a1d9-68df-4447-8481-3b4fd0de734c-128-133-443 (1)Deck 8983a1d9-68df-4447-8481-3b4fd0de734c-128-133-443 (1)
Deck 8983a1d9-68df-4447-8481-3b4fd0de734c-128-133-443 (1)
Justin Ezor
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
CodePolitan
 
Intro to JavaScript
Intro to JavaScriptIntro to JavaScript
Intro to JavaScript
Aaron Lamphere
 
who we are
who we arewho we are
who we are
AlenDuranovic
 
Dinesh_resume_professional
Dinesh_resume_professionalDinesh_resume_professional
Dinesh_resume_professional
Dinesh Saivarma
 
KyleMorrisonCV2016Shift
KyleMorrisonCV2016ShiftKyleMorrisonCV2016Shift
KyleMorrisonCV2016Shift
Kyle Morrison
 
Android material design lecture #2
Android material design   lecture #2Android material design   lecture #2
Android material design lecture #2
Vitali Pekelis
 
Resume_Richa
Resume_RichaResume_Richa
Resume_Richa
richaa gupta
 
We are the music makers and we are the dreamers of dreams
We are the music makers and we are the dreamers of dreamsWe are the music makers and we are the dreamers of dreams
We are the music makers and we are the dreamers of dreams
Texas Natural Resources Information System
 
Google Assistant Overview
Google Assistant Overview  Google Assistant Overview
Google Assistant Overview
AI.academy
 
BU DSC Info Session
BU DSC Info SessionBU DSC Info Session
BU DSC Info Session
YashShroff8
 
Dev conf 2018 DesOps - Prepare Today for Future of Design
Dev conf 2018 DesOps - Prepare Today for Future of Design Dev conf 2018 DesOps - Prepare Today for Future of Design
Dev conf 2018 DesOps - Prepare Today for Future of Design
Samir Dash
 
What Problem is Your Organization Looking to Solve?
What Problem is Your Organization Looking to Solve?What Problem is Your Organization Looking to Solve?
What Problem is Your Organization Looking to Solve?
Float
 
Building the Neo4j Sandbox: AWS, ECS, Docker, Python, Neo4j, ++
Building the Neo4j Sandbox: AWS, ECS, Docker, Python, Neo4j, ++Building the Neo4j Sandbox: AWS, ECS, Docker, Python, Neo4j, ++
Building the Neo4j Sandbox: AWS, ECS, Docker, Python, Neo4j, ++
Ryan Boyd
 
GDSC SDIET INFO SESSION
GDSC SDIET INFO SESSIONGDSC SDIET INFO SESSION
GDSC SDIET INFO SESSION
Nikhil493971
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Future
jexp
 

Similar to 3D Environment : HomeNavigation (20)

Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]
 
Information session
Information sessionInformation session
Information session
 
DSC NTUE Info Session
DSC NTUE Info SessionDSC NTUE Info Session
DSC NTUE Info Session
 
Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...
 
Deck 8983a1d9-68df-4447-8481-3b4fd0de734c-128-133-443 (1)
Deck 8983a1d9-68df-4447-8481-3b4fd0de734c-128-133-443 (1)Deck 8983a1d9-68df-4447-8481-3b4fd0de734c-128-133-443 (1)
Deck 8983a1d9-68df-4447-8481-3b4fd0de734c-128-133-443 (1)
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Intro to JavaScript
Intro to JavaScriptIntro to JavaScript
Intro to JavaScript
 
who we are
who we arewho we are
who we are
 
Dinesh_resume_professional
Dinesh_resume_professionalDinesh_resume_professional
Dinesh_resume_professional
 
KyleMorrisonCV2016Shift
KyleMorrisonCV2016ShiftKyleMorrisonCV2016Shift
KyleMorrisonCV2016Shift
 
Android material design lecture #2
Android material design   lecture #2Android material design   lecture #2
Android material design lecture #2
 
Resume_Richa
Resume_RichaResume_Richa
Resume_Richa
 
We are the music makers and we are the dreamers of dreams
We are the music makers and we are the dreamers of dreamsWe are the music makers and we are the dreamers of dreams
We are the music makers and we are the dreamers of dreams
 
Google Assistant Overview
Google Assistant Overview  Google Assistant Overview
Google Assistant Overview
 
BU DSC Info Session
BU DSC Info SessionBU DSC Info Session
BU DSC Info Session
 
Dev conf 2018 DesOps - Prepare Today for Future of Design
Dev conf 2018 DesOps - Prepare Today for Future of Design Dev conf 2018 DesOps - Prepare Today for Future of Design
Dev conf 2018 DesOps - Prepare Today for Future of Design
 
What Problem is Your Organization Looking to Solve?
What Problem is Your Organization Looking to Solve?What Problem is Your Organization Looking to Solve?
What Problem is Your Organization Looking to Solve?
 
Building the Neo4j Sandbox: AWS, ECS, Docker, Python, Neo4j, ++
Building the Neo4j Sandbox: AWS, ECS, Docker, Python, Neo4j, ++Building the Neo4j Sandbox: AWS, ECS, Docker, Python, Neo4j, ++
Building the Neo4j Sandbox: AWS, ECS, Docker, Python, Neo4j, ++
 
GDSC SDIET INFO SESSION
GDSC SDIET INFO SESSIONGDSC SDIET INFO SESSION
GDSC SDIET INFO SESSION
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Future
 

More from YeChan(Paul) Kim

강화학습과 LV&A 그리고 Navigation Agent
강화학습과 LV&A 그리고 Navigation Agent강화학습과 LV&A 그리고 Navigation Agent
강화학습과 LV&A 그리고 Navigation Agent
YeChan(Paul) Kim
 
Neural module Network
Neural module NetworkNeural module Network
Neural module Network
YeChan(Paul) Kim
 
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
YeChan(Paul) Kim
 
Multiagent Cooperative and Competition with Deep Reinforcement Learning
Multiagent Cooperative and Competition with Deep Reinforcement LearningMultiagent Cooperative and Competition with Deep Reinforcement Learning
Multiagent Cooperative and Competition with Deep Reinforcement Learning
YeChan(Paul) Kim
 
Diversity is all you need(DIAYN) : Learning Skills without a Reward Function
Diversity is all you need(DIAYN) : Learning Skills without a Reward FunctionDiversity is all you need(DIAYN) : Learning Skills without a Reward Function
Diversity is all you need(DIAYN) : Learning Skills without a Reward Function
YeChan(Paul) Kim
 
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
YeChan(Paul) Kim
 
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
YeChan(Paul) Kim
 

More from YeChan(Paul) Kim (7)

강화학습과 LV&A 그리고 Navigation Agent
강화학습과 LV&A 그리고 Navigation Agent강화학습과 LV&A 그리고 Navigation Agent
강화학습과 LV&A 그리고 Navigation Agent
 
Neural module Network
Neural module NetworkNeural module Network
Neural module Network
 
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
 
Multiagent Cooperative and Competition with Deep Reinforcement Learning
Multiagent Cooperative and Competition with Deep Reinforcement LearningMultiagent Cooperative and Competition with Deep Reinforcement Learning
Multiagent Cooperative and Competition with Deep Reinforcement Learning
 
Diversity is all you need(DIAYN) : Learning Skills without a Reward Function
Diversity is all you need(DIAYN) : Learning Skills without a Reward FunctionDiversity is all you need(DIAYN) : Learning Skills without a Reward Function
Diversity is all you need(DIAYN) : Learning Skills without a Reward Function
 
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
 
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
 

Recently uploaded

aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
Karen593256
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 

Recently uploaded (20)

aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 

3D Environment : HomeNavigation

  • 1. Look, Listen and Act Written by 이의령, 김예찬, 양홍선 보고 듣고 행동하는 에이전트 3D Environment HomeNavi 1
  • 3. Project Introduction 3D Environment 기반 Home Navigation • House(Indoor) 3D Dataset • Reinforcement Learning Environment • ‘Go to Kitchen’ 과 같은 Instruction 기반 Task 수행 3
  • 5. • Camera motion • Robotics / Manipulation • APIs Project Introduction Language ActionsVision • Image / video understanding • 3D environment perception • Instruction following • Question answering • Dialog ‘Complete’ Agent 5
  • 6. Motivation Target-driven Visual Navigation Model using Deep Reinforcement Learning Y Zhu , ICRA 2017 6
  • 9. Mobile Robot 9 A mobile robot is a robot that is capable of locomotion. - wikipedia- 중분류 소분류 기술내용 Navigation Driving Path Planning Obstacle Avoidance Recognizing the surroundings Localization & Mapping Dead Reckoning LandMark SLAM Credit : Machine Learning & Robotics / Geonhee Lee
  • 10. Path Planning 10 • 현재 위치에서부터 지도상에 지정받은 목표 지점까지 이동 궤적(Trajectory)을 생성 • Map상의 Global Path Planning과 Local Path Planning으로 나누어 로봇의 이동 경로를 생성 • Algorithm: A*, D*, RRT(Rapidly-exploring random tree), Probabilistic Roadmap 등 Credit : Machine Learning & Robotics / Geonhee Lee
  • 11. SLAM 11 Simultaneous Localization and Mapping • Computational problem of constructing a map of an environment while simultaneously keeping track of a robot’s location Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
  • 12. SLAM 12 Visual Localization • Under the inaccurate GPS • GPS-denied environment Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
  • 13. SLAM 13 Mapping • Scenarios in which a prior map is not available and needs to be built. • Map can inform path planning or provide an intuitive visualization for a human or robot. Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
  • 15. RL based Navigation 15 Reinforcement Learning with Auxiliary Task (Deepmind 2016) Credit : https://deepmind.com/blog/reinforcement-learning-unsupervised-auxiliary-tasks/
  • 16. RL based Navigation 16www.website.com Model Architecture Credit : https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/
  • 17. Your Logo Here Vision – Language – Navigation Deep RL based Navigation 17
  • 18. Vision - Language Vision + Language Application • Image Captioning Input: The man at bat readies to swing at the pitch while the umpire looks on. Desired Output: A large bus sitting next to a very tall building. 18
  • 19. Vision - Language Vision + Language Deep Learning Architecture • Image Captioning Credit : https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/ 19
  • 20. Vision - Language 20 Vision + Language Application • Visual Question Answering(VQA) Input: Q: What is the Musache made of? Q: Is this a Vegetarian Pizza? Desired Output: A: Bananas A: No
  • 21. Vision - Language Vision + Language Deep Learning Architecture • Visual Question Answering(VQA) Credit : https://arxiv.org/pdf/1505.00468v6.pdf 21
  • 22. Vision - Language Navigation 22 Evolution of Language and Vision datasets towards Actions Credit : https://lvatutorial.github.io/
  • 23. Vision - Language Navigation 23 Evolution of Language and Vision datasets towards Actions
  • 24. Vision - Language Navigation 24 Evolution of Language and Vision datasets towards Actions
  • 25. Vision - Language Navigation 25 Evolution of Language and Vision datasets towards Actions
  • 26. Vision - Language Navigation 26 Evolution of Language and Vision datasets towards Actions
  • 27. Vision - Language Navigation 27 Evolution of Language and Vision datasets towards Actions
  • 28. Vision - Language Navigation 28 Evolution of Language and Vision datasets towards Actions
  • 29. Vision - Language Navigation 29 Evolution of Language and Vision datasets towards Actions
  • 30. Vision - Language Navigation 30 Evolution of Language and Vision datasets towards Actions
  • 31. Vision - Language Navigation 31 Evolution of Language and Vision datasets towards Actions
  • 32. Vision - Language Navigation 32 Evolution of Language and Vision datasets towards Actions
  • 33. Vision - Language Navigation 33 Language ActionsVision • Image / video understanding • 3D environment perception • Camera motion • Robotics / Manipulation • APIs • Instruction following • Question answering • Dialog ‘Complete’ Agent
  • 34. 3D Environment 34 X Datasets Environments Tasks & Metrics Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
  • 35. 3D Environment 35 X SUNCG (Song et al., 2017) Datasets Environments Tasks & Metrics Matterport3D (Chang et al., 2017) Stanford 2D-3D-S (Armeni et al., 2017) Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
  • 36. 3D Environment 36 X SUNCG (Song et al., 2017) Datasets Environments Tasks & Metrics Matterport3D (Chang et al., 2017) AI2-THOR (Kolve et al., 2017) MINOS (Savva et al., 2017) Gibson (Zamir et al., 2018) Stanford 2D-3D-S (Armeni et al., 2017) CHALET (Yan et al., 2018) House3D (Wu et al., 2017) HoME (Brodeur et al., 2018) VirtualHome (Puig et al., 2018) AdobeIndoorNav (Mo et al., 2018) Matterport3DSim (Anderson et al., 2018) Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
  • 37. 3D Environment 37Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das X EmbodiedQA SUNCG (Song et al., 2017) Datasets Environments Tasks & Metrics Matterport3D (Chang et al., 2017) AI2-THOR (Kolve et al., 2017) MINOS (Savva et al., 2017) Gibson (Zamir et al., 2018) Stanford 2D-3D-S (Armeni et al., 2017) CHALET (Yan et al., 2018) House3D (Wu et al., 2017) Interactive QA (Gordon et al., 2018) Vision-Language Navigation (Anderson et al., 2018) Language grounding (Chaplot et al., 2017, Hermann & Hill et al., 2017) Visual Navigation (Zhu & Gordon et al., 2017, Savva et al., 2017, Wu et al., 2017) HoME (Brodeur et al., 2018) VirtualHome (Puig et al., 2018) AdobeIndoorNav (Mo et al., 2018) Matterport3DSim (Anderson et al., 2018)
  • 38. 3D Environment 38Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das X EmbodiedQA SUNCG (Song et al., 2017) Datasets Environments Tasks & Metrics Matterport3D (Chang et al., 2017) AI2-THOR (Kolve et al., 2017) MINOS (Savva et al., 2017) Gibson (Zamir et al., 2018) Stanford 2D-3D-S (Armeni et al., 2017) CHALET (Yan et al., 2018) House3D (Wu et al., 2017) Interactive QA (Gordon et al., 2018) Vision-Language Navigation (Anderson et al., 2018) Language grounding (Chaplot et al., 2017, Hermann & Hill et al., 2017) Visual Navigation (Zhu & Gordon et al., 2017, Savva et al., 2017, Wu et al., 2017) HoME (Brodeur et al., 2018) VirtualHome (Puig et al., 2018) AdobeIndoorNav (Mo et al., 2018) Matterport3DSim (Anderson et al., 2018) >= 2017 (!)
  • 39. Paper (in project) 39 ⚫ House3D Environment 구축 ⚫ RoomNav 학습 모델 House3D Yi Wu et, al(2017) Gated Attention Chaplot et, al(2017) ⚫ Gated Attention Module ⚫ House3D RoomNav의 레퍼런스 모델 Embodied QA Abhishek et, al(2017) ⚫ 최초 VQA + RL 접근 ⚫ Embodied QA Dataset 구축 ⚫ Hirarchical Model ⚫ PACMAN 학습 모델 ⚫ CVPR 2018 FollowNet P Shah et, al(2017) ⚫ Conditioned Attention 모형 ⚫ Long Instruction(Language) 사용 ⚫ ICRA 2018 Arxiv Link Arxiv Link Arxiv Link Arxiv Link Code Code Code
  • 40. Paper 40 ⚫ Target Driven Visual Navigation in Indoor Scene ⚫ Siamese 형태의 RL기반 Navigation 학습 모델 ⚫ ICRA 2017 Target Driven Visual Navi Yuke Zhu et, al(2017) CMP Gupta et, al(2017) Arxiv Link ⚫ Cognitive Mapping and Planning for visual Navigation ⚫ Value Iteration Network ⚫ CVPR 2017 Arxiv Link ⚫ Visual Question Answering in Interactive Environment ⚫ CVPR 2018 Arxiv Link CodeCodeCode ⚫ Vision and Language Navigation ⚫ CVPR 2018 spotlight Arxiv Link IQA Gordon et, al(2018) VLN Anderson et, al(2018)
  • 41. Paper 41 Vision Language Navigation 이란 제목으로 2017년부터 지속적으로 Paper & Environment이 나오고 있는 추세
  • 46. Dataset: SUNCG 사람이 디자인한 45,622 개의 3D 씬 평균적으로 8.9개의 방과 1.3층으로 구성됨 20개의 방 종류 (bedroom, living room, …) 80개의 개체 유형 (cup, chair, …) 46
  • 48. Tasks: RoomNav, Embodied QA RoomNav 48
  • 49. Tasks: RoomNav, Embodied QA Embodied QA 49
  • 53. 53
  • 56. 56
  • 57. Look / Listen /Act 57
  • 58. Look / Listen /Act 보고 58
  • 59. Look / Listen /Act 보고 듣고 59
  • 60. Look / Listen /Act 보고 듣고 행동 60
  • 61. 61
  • 62. 62
  • 63. 63
  • 64. 64
  • 65. 65
  • 66. 66
  • 67. 67
  • 73. 73
  • 76. Easy에서 학습된 모델을 → Hard에 적용 76
  • 77. Visual input과 instruction과의 관계는 알고 있으나.. 77
  • 80. Navigation + House3D = Ultra-hard 80
  • 82. 82
  • 83. 83