SlideShare a Scribd company logo
1 of 23
Download to read offline
Code available at: http://vision.snu.ac.kr/projects/cb
Curiosity-Bottleneck:
Exploration by
DistillingTask-Specific Novelty
ICML 2019
Youngjin Kim
Hyunwoo Kim*
Wontae Nam*
Jihoon Kim
Gunhee Kim
(*equal contribution)
Exploitation vs. Exploration
Image source: UC Berkeley AI course slide, lecture 11
NEW
!
Extrinsic Reward vs. Intrinsic Reward
+500 SCORE for getting an item !
-150 SCORE for stepping a bomb : ( +200 MOTIVATION SCORE
as I’ve never been to this place !
-150 MOTIVATION SCORE
I’ve been here too many times
Previous Research on Exploration
Anything Novel
Source for Novelty
Task-irrelevant
Novelty
Task-relevant
Novelty
Our Research
Task-irrelevant
Novelty
Task-relevant
Novelty
1. Distractive environments are widespread
§ Real-world observations contain novel but task-irrelevant information.
Problematic situation:
Exploration under Distraction
(a) Known Place
(b) Known Place
with Strangers
Navigating robot
2. Degeneration of prior novelty-based exploration strategies
§ Due to task-agnostic intrinsic reward
§ Need mechanisms to prioritize task-relevant novelty
Not Novel Novel
Problematic situation:
Exploration under Distraction
(a) Known Place
(b) Known Place
with Strangers
Navigating robot
Quantify the ‘Degree of Compression’ using
a compressive value network
𝑥" 𝜋E
Compressor
𝑟"
%
E 𝑟"
&
𝑎"
Value Predictor
Intrinsic Reward
External Reward
Environment Policy Environment
Our approach: Curiosity-Bottleneck
(𝑦"
§ Encode rare 𝑥 to a lengthy code and common 𝑥 to a shorter code
§ Discard information about 𝑥 during compression
Our approach: Curiosity-Bottleneck
𝑥" 𝜋E
Compressor
𝑟"
%
E 𝑟"
&
𝑎"
Value Predictor
Intrinsic Reward
External Reward
Environment Policy Environment
Compressor
(𝑦"
§ Prevent the Compressor from discarding task-related information
𝑥" 𝜋E
Compressor
𝑟"
%
E 𝑟"
&
𝑎"
Value Predictor
Intrinsic Reward
External Reward
Environment Policy Environment
Our approach: Curiosity-Bottleneck
Value Predictor
(𝑦"
1. Objective Function
§ Minimize average code-length of representation 𝑍
§ Discard information about observation 𝑋
𝑚𝑎𝑥 𝐼(𝑍; 𝑌)
𝑚𝑖𝑛 𝐻(𝑍) − 𝐻 𝑍 𝑋 = 𝑚𝑖𝑛 𝐼(𝑋; 𝑍)
§ Preserve information related to value estimate 𝑌
𝐿 = −𝐼 𝑍; 𝑌 + 𝛽𝐼 𝑋; 𝑍
𝑟%
(𝑥) = :
;
𝑝 𝑧 𝑥 log
𝑝 𝑥, 𝑧
𝑝 𝑥 𝑝(𝑧)
𝑑𝑧
2. Intrinsic Reward: Per-instance Mutual Information
Our approach: Curiosity-Bottleneck
3. Approximation
Variational Information Bottleneck with Gaussian assumptions
𝐿C,D = 𝐸F,G[− log 𝑞D 𝑦 𝑧 + 𝛽𝐾𝐿[𝑝C 𝑍 𝑥 | 𝑞 𝑍 ]
𝑟%
(𝑥) = 𝐾𝐿[𝑝C 𝑍 𝑥 ||𝑞 𝑍 ]
𝑧" ∼ 𝑝C(𝑍|𝑥")𝑥"
Compressor
𝜇C, 𝜎C
𝐾𝐿[𝑝C(𝑍|𝑥")||𝑞(𝑍)]
Value Predictor
𝜇D, 𝜎D
𝑟"
%
−log𝑞D(𝑦"|𝑧")
𝐿C,D
+
Our approach: Curiosity-Bottleneck
Proof of concept: static images
Random
Box
Object
Pixel
Noise
Detects novelty 𝑝"( ) while being robust to distraction 𝑝P( )
(b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash
0.1 0.9
0.1
0.9
𝑝"
𝑝P
Random
Box
Object
Pixel
Noise
(a) Input
0.1
0.9
𝑝"
0.1
0.9
𝑝"
0.1 0.9𝑝P 0.1 0.9𝑝P 0.1 0.9𝑝P 0.1 0.9𝑝P
Proof of concept: static images
Experiment:Treasure Hunt
§ Agent is depicted as a circle
§ Item(triangle) with reward is hidden somewhere
§ The item appears only when the agent is nearby
§ Once the agent obtains an item, the next item
will be spawned in another area (also hidden)
§ The traces(pentagon) of eaten items will remain
§ Get the maximum score!
Example of the game play
Outline of the game
Experiment:Treasure Hunt
Movement condition
2 types of onset conditions for distraction
Location condition
When the agent stays
in the same location
When the agent stays
in the corners of the map
Consistently outperform baselines on different distraction settings
MeanEpisodicReward
(a) Movement Condition
CB CB-noKL RND Dynamics SimHash
(b) Location Condition
1e6 1e6
Experiment:Treasure Hunt
Experiment:Treasure Hunt
𝑥
𝑧
𝑞(𝑍)𝑝C(𝑍|𝑥") 𝑝C(𝑍|𝑥P)
𝑥" 𝑥P
Range of Experiences
𝑞(𝑍)𝑝C(𝑍|𝑥") 𝑝C(𝑍|𝑥P)
𝑥P𝑥"
Range of Experiences
𝛻KL− 𝛻log 𝑞D
− 𝛻log 𝑞D𝛻KL
𝑦 Target Value ( ) and Prediction ( )
(a) Early Training Steps (b) After Collecting Rewards
𝛻KL
− 𝛻log 𝑞D
𝛻KL − 𝛻log 𝑞D
18.2 8.018.1 4.6
𝑧
.….
illustration of adaptive exploration strategy
(a) Input (b) CB-Early (d) CB-noKL (f) Dynamics(e) RND(c) CB (g) SimHash
Compression loss term induces task-agnostic exploration in early stages
𝑲𝑳[𝒑 𝜽 𝒁 𝒙 ||𝒒 𝒁 ]
Grad-CamVisualization
The adaptive exploration strategy
Experiment:Treasure Hunt
Value prediction loss term induces task-specific exploration
after collecting external rewards
− 𝒍𝒐𝒈 𝒒 𝝓 𝒚 𝒛
(a) Input (b) CB-Early (d) CB-noKL (f) Dynamics(e) RND(c) CB (g) SimHash
Grad-CamVisualization
The adaptive exploration strategy
Experiment:Treasure Hunt
Gravitar Solaris
WithDistractionW.o.Distraction
Montezuma
CB CB-noKL RND Dynamics SimHash
Experiment: Atari Hard-exploration Games
Contributions
• First work to discriminate information by task-relevancy
→ Focus on task-relevant novelty and filter out distractive information
• Utilize information bottleneck as a novelty measure
→ the KL-divergence term as a degree of compression
• Extensive experiments
→ Experimented on a custom grid-world environment
to show situations where previous methods suffer.
Experimented on Atari environment for generality.
• Psychologically plausible

More Related Content

Similar to Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty

Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysisDave Selinger
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Numenta
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From DataSungjoon Choi
 
Paper reading best of both world
Paper reading best of both worldPaper reading best of both world
Paper reading best of both worldShinagawa Seitaro
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
 
CG OpenGL surface detection+illumination+rendering models-course 9
CG OpenGL surface detection+illumination+rendering models-course 9CG OpenGL surface detection+illumination+rendering models-course 9
CG OpenGL surface detection+illumination+rendering models-course 9fungfung Chen
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Kernel Bayes Rule
Kernel Bayes RuleKernel Bayes Rule
Kernel Bayes RuleYan Xu
 
Neural Art (English Version)
Neural Art (English Version)Neural Art (English Version)
Neural Art (English Version)Mark Chang
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdfEmanAsem4
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn nsAndrew Brozek
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptxssuser7807522
 

Similar to Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty (20)

Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From Data
 
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof..."Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
 
Paper reading best of both world
Paper reading best of both worldPaper reading best of both world
Paper reading best of both world
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
 
CG OpenGL surface detection+illumination+rendering models-course 9
CG OpenGL surface detection+illumination+rendering models-course 9CG OpenGL surface detection+illumination+rendering models-course 9
CG OpenGL surface detection+illumination+rendering models-course 9
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Kernel Bayes Rule
Kernel Bayes RuleKernel Bayes Rule
Kernel Bayes Rule
 
Neural Art (English Version)
Neural Art (English Version)Neural Art (English Version)
Neural Art (English Version)
 
riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Hierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement LearningHierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement Learning
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx
 

More from Hyunwoo Kim

서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업Hyunwoo Kim
 
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory NetworksAbstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory NetworksHyunwoo Kim
 
Genetic Algorithm Project 2
Genetic Algorithm Project 2Genetic Algorithm Project 2
Genetic Algorithm Project 2Hyunwoo Kim
 
Sentiment Analysis Intro
Sentiment Analysis IntroSentiment Analysis Intro
Sentiment Analysis IntroHyunwoo Kim
 
Universal Adversarial Perturbation
Universal Adversarial PerturbationUniversal Adversarial Perturbation
Universal Adversarial PerturbationHyunwoo Kim
 
Two VWM representations simultaneously control attention
Two VWM representations simultaneously control attentionTwo VWM representations simultaneously control attention
Two VWM representations simultaneously control attentionHyunwoo Kim
 
Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Hyunwoo Kim
 
Neural Networks Basics with PyTorch
Neural Networks Basics with PyTorchNeural Networks Basics with PyTorch
Neural Networks Basics with PyTorchHyunwoo Kim
 
Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표Hyunwoo Kim
 
Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표Hyunwoo Kim
 
Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표Hyunwoo Kim
 
Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표Hyunwoo Kim
 
Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표Hyunwoo Kim
 
Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]Hyunwoo Kim
 
Random Forest Intro [랜덤포레스트 설명]
Random Forest Intro [랜덤포레스트 설명]Random Forest Intro [랜덤포레스트 설명]
Random Forest Intro [랜덤포레스트 설명]Hyunwoo Kim
 
Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]Hyunwoo Kim
 

More from Hyunwoo Kim (16)

서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
 
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory NetworksAbstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
 
Genetic Algorithm Project 2
Genetic Algorithm Project 2Genetic Algorithm Project 2
Genetic Algorithm Project 2
 
Sentiment Analysis Intro
Sentiment Analysis IntroSentiment Analysis Intro
Sentiment Analysis Intro
 
Universal Adversarial Perturbation
Universal Adversarial PerturbationUniversal Adversarial Perturbation
Universal Adversarial Perturbation
 
Two VWM representations simultaneously control attention
Two VWM representations simultaneously control attentionTwo VWM representations simultaneously control attention
Two VWM representations simultaneously control attention
 
Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표
 
Neural Networks Basics with PyTorch
Neural Networks Basics with PyTorchNeural Networks Basics with PyTorch
Neural Networks Basics with PyTorch
 
Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표
 
Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표
 
Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표
 
Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표
 
Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표
 
Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]
 
Random Forest Intro [랜덤포레스트 설명]
Random Forest Intro [랜덤포레스트 설명]Random Forest Intro [랜덤포레스트 설명]
Random Forest Intro [랜덤포레스트 설명]
 
Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty

  • 1. Code available at: http://vision.snu.ac.kr/projects/cb Curiosity-Bottleneck: Exploration by DistillingTask-Specific Novelty ICML 2019 Youngjin Kim Hyunwoo Kim* Wontae Nam* Jihoon Kim Gunhee Kim (*equal contribution)
  • 2. Exploitation vs. Exploration Image source: UC Berkeley AI course slide, lecture 11 NEW !
  • 3. Extrinsic Reward vs. Intrinsic Reward +500 SCORE for getting an item ! -150 SCORE for stepping a bomb : ( +200 MOTIVATION SCORE as I’ve never been to this place ! -150 MOTIVATION SCORE I’ve been here too many times
  • 4. Previous Research on Exploration Anything Novel
  • 7. 1. Distractive environments are widespread § Real-world observations contain novel but task-irrelevant information. Problematic situation: Exploration under Distraction (a) Known Place (b) Known Place with Strangers Navigating robot
  • 8. 2. Degeneration of prior novelty-based exploration strategies § Due to task-agnostic intrinsic reward § Need mechanisms to prioritize task-relevant novelty Not Novel Novel Problematic situation: Exploration under Distraction (a) Known Place (b) Known Place with Strangers Navigating robot
  • 9. Quantify the ‘Degree of Compression’ using a compressive value network 𝑥" 𝜋E Compressor 𝑟" % E 𝑟" & 𝑎" Value Predictor Intrinsic Reward External Reward Environment Policy Environment Our approach: Curiosity-Bottleneck (𝑦"
  • 10. § Encode rare 𝑥 to a lengthy code and common 𝑥 to a shorter code § Discard information about 𝑥 during compression Our approach: Curiosity-Bottleneck 𝑥" 𝜋E Compressor 𝑟" % E 𝑟" & 𝑎" Value Predictor Intrinsic Reward External Reward Environment Policy Environment Compressor (𝑦"
  • 11. § Prevent the Compressor from discarding task-related information 𝑥" 𝜋E Compressor 𝑟" % E 𝑟" & 𝑎" Value Predictor Intrinsic Reward External Reward Environment Policy Environment Our approach: Curiosity-Bottleneck Value Predictor (𝑦"
  • 12. 1. Objective Function § Minimize average code-length of representation 𝑍 § Discard information about observation 𝑋 𝑚𝑎𝑥 𝐼(𝑍; 𝑌) 𝑚𝑖𝑛 𝐻(𝑍) − 𝐻 𝑍 𝑋 = 𝑚𝑖𝑛 𝐼(𝑋; 𝑍) § Preserve information related to value estimate 𝑌 𝐿 = −𝐼 𝑍; 𝑌 + 𝛽𝐼 𝑋; 𝑍 𝑟% (𝑥) = : ; 𝑝 𝑧 𝑥 log 𝑝 𝑥, 𝑧 𝑝 𝑥 𝑝(𝑧) 𝑑𝑧 2. Intrinsic Reward: Per-instance Mutual Information Our approach: Curiosity-Bottleneck
  • 13. 3. Approximation Variational Information Bottleneck with Gaussian assumptions 𝐿C,D = 𝐸F,G[− log 𝑞D 𝑦 𝑧 + 𝛽𝐾𝐿[𝑝C 𝑍 𝑥 | 𝑞 𝑍 ] 𝑟% (𝑥) = 𝐾𝐿[𝑝C 𝑍 𝑥 ||𝑞 𝑍 ] 𝑧" ∼ 𝑝C(𝑍|𝑥")𝑥" Compressor 𝜇C, 𝜎C 𝐾𝐿[𝑝C(𝑍|𝑥")||𝑞(𝑍)] Value Predictor 𝜇D, 𝜎D 𝑟" % −log𝑞D(𝑦"|𝑧") 𝐿C,D + Our approach: Curiosity-Bottleneck
  • 14. Proof of concept: static images Random Box Object Pixel Noise
  • 15. Detects novelty 𝑝"( ) while being robust to distraction 𝑝P( ) (b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash 0.1 0.9 0.1 0.9 𝑝" 𝑝P Random Box Object Pixel Noise (a) Input 0.1 0.9 𝑝" 0.1 0.9 𝑝" 0.1 0.9𝑝P 0.1 0.9𝑝P 0.1 0.9𝑝P 0.1 0.9𝑝P Proof of concept: static images
  • 16. Experiment:Treasure Hunt § Agent is depicted as a circle § Item(triangle) with reward is hidden somewhere § The item appears only when the agent is nearby § Once the agent obtains an item, the next item will be spawned in another area (also hidden) § The traces(pentagon) of eaten items will remain § Get the maximum score! Example of the game play Outline of the game
  • 17. Experiment:Treasure Hunt Movement condition 2 types of onset conditions for distraction Location condition When the agent stays in the same location When the agent stays in the corners of the map
  • 18. Consistently outperform baselines on different distraction settings MeanEpisodicReward (a) Movement Condition CB CB-noKL RND Dynamics SimHash (b) Location Condition 1e6 1e6 Experiment:Treasure Hunt
  • 19. Experiment:Treasure Hunt 𝑥 𝑧 𝑞(𝑍)𝑝C(𝑍|𝑥") 𝑝C(𝑍|𝑥P) 𝑥" 𝑥P Range of Experiences 𝑞(𝑍)𝑝C(𝑍|𝑥") 𝑝C(𝑍|𝑥P) 𝑥P𝑥" Range of Experiences 𝛻KL− 𝛻log 𝑞D − 𝛻log 𝑞D𝛻KL 𝑦 Target Value ( ) and Prediction ( ) (a) Early Training Steps (b) After Collecting Rewards 𝛻KL − 𝛻log 𝑞D 𝛻KL − 𝛻log 𝑞D 18.2 8.018.1 4.6 𝑧 .…. illustration of adaptive exploration strategy
  • 20. (a) Input (b) CB-Early (d) CB-noKL (f) Dynamics(e) RND(c) CB (g) SimHash Compression loss term induces task-agnostic exploration in early stages 𝑲𝑳[𝒑 𝜽 𝒁 𝒙 ||𝒒 𝒁 ] Grad-CamVisualization The adaptive exploration strategy Experiment:Treasure Hunt
  • 21. Value prediction loss term induces task-specific exploration after collecting external rewards − 𝒍𝒐𝒈 𝒒 𝝓 𝒚 𝒛 (a) Input (b) CB-Early (d) CB-noKL (f) Dynamics(e) RND(c) CB (g) SimHash Grad-CamVisualization The adaptive exploration strategy Experiment:Treasure Hunt
  • 22. Gravitar Solaris WithDistractionW.o.Distraction Montezuma CB CB-noKL RND Dynamics SimHash Experiment: Atari Hard-exploration Games
  • 23. Contributions • First work to discriminate information by task-relevancy → Focus on task-relevant novelty and filter out distractive information • Utilize information bottleneck as a novelty measure → the KL-divergence term as a degree of compression • Extensive experiments → Experimented on a custom grid-world environment to show situations where previous methods suffer. Experimented on Atari environment for generality. • Psychologically plausible