SlideShare a Scribd company logo
1 of 41
Department of Computer
Science and Engineering
IIT Kharagpur
Imitation Learning: Learning to
Act like Humans from Humans
SSN College of Engineering – Faculty Development Program Talk
10:45-12:15 IST, 25 Nov 2017
Anirban Santara
santara.github.io
Department of Computer
Science and Engineering
IIT Kharagpur
About me
Anirban Santara
Google India Ph.D. Fellow at
IIT Kharagpur (2015-Present)
Graduate Research Intern at
Intel Labs for Autonomous
Driving (2017-Present)
B.Tech. in Electronics and
Electrical Communication
Engineering from IIT
Kharagpur in 2015
Department of Computer
Science and Engineering
IIT Kharagpur
Contents
1. Building the motivation
2. Problem definition and Different Approaches to Solution
3. Issues of Safety and Reliability
Department of Computer
Science and Engineering
IIT Kharagpur
Description of the Imitation
Learning Problem
Department of Computer
Science and Engineering
IIT Kharagpur
Imitation Learning
Imitation Learning
techniques aim to mimic
human behavior at a given
task1
1 Hussein, Ahmed, et al. "Imitation Learning: A Survey of Learning Methods." ACM Computing Surveys (CSUR)
50.2 (2017): 21. Image Source: GRASP lab - University of Pennsylvania
Department of Computer
Science and Engineering
IIT Kharagpur
Why should you care?
• Imitation learning methods are rooted in neuro-science and form an
important part of learning in humans
• Makes it possible to teach robots complex tasks with minimal expert
knowledge of the tasks
• No need for explicit programming or task-specific reward function design
• Its high time!
• Modern sensors are able to collect and transmit high volumes of data at high speed
• High performance computing is cheaper, more capable and ubiquitous than ever
• Virtual Reality systems – that are considered the best portal of human-machine
interaction – are widely available
Department of Computer
Science and Engineering
IIT Kharagpur
Example Application Areas
Department of Computer
Science and Engineering
IIT Kharagpur
Autonomous Driving
No more accidents due to human error. No more traffic jams.
Department of Computer
Science and Engineering
IIT Kharagpur
Robotic Surgery
Complex Actions in Critical Situations – Accurate. Every time.
Department of Computer
Science and Engineering
IIT Kharagpur
Industrial Automation
Efficiency. Precise Quality Control. Safety.
Department of Computer
Science and Engineering
IIT Kharagpur
Assistive Robotics
Elderly Care. Rehabilitation. Special Needs.
Department of Computer
Science and Engineering
IIT Kharagpur
Conversational Agents
Assistance. Recommendation. Therapy.
Department of Computer
Science and Engineering
IIT Kharagpur
Approaches to Solution
Department of Computer
Science and Engineering
IIT Kharagpur
A quick primer on Machine Learning
Reference application – Driving a Racing Car
State variables (X):
• Position in track
• Distance from track
edges along different
directions
• Direction of heading
• Current speed
Action Variables (Y):
• Steering
• Acceleration
• Brake
Department of Computer
Science and Engineering
IIT Kharagpur
Comparison of ML paradigms
Supervised Learning
• Would require training
examples in the form:
{ 𝑋𝑖, 𝑌𝑖 }𝑖=1
𝑁
• Where, 𝑌𝑖 are
true/correct
actions that must be
taken in state 𝑋𝑖
Unsupervised Learning
• Works only on with the
input state information
𝑋𝑖
• Does not use any
kind of feedback
from the environment
regarding performance
of the agent
Reinforcement Learning
• Requires feedback from the
environment in the form of
reward signals
• Reward signals might be
sparse and delayed
• But it should indicate the
quality of actions being
taken by the agent in
different states
e.g. +1 if the car makes progress, -1 if it
comes to a halt, -10 if it bumps into an
obstacle, 100 if it finishes the race
Department of Computer
Science and Engineering
IIT Kharagpur
Problem Setting
Our Agent has to achieve its
goal by taking a sequence of
actions in an environment
whose states change in
response to the agent’s
actions.
ActionNew State
Environment
Agent
Department of Computer
Science and Engineering
IIT Kharagpur
Mathematical Formulation
Markov Decision Process (MDP)
Imitation Learning problems are often specified in terms of a Markov Decision Process
(MDP). An MDP is defined as ℳ = (𝑆, 𝐴, 𝑇, 𝑟, 𝜌0, 𝛾)
• State Space 𝑆: Set of all possible states/configurations of the environment
• Action Space 𝐴: Set of all possible actions
• Transition Probability 𝑇: 𝑆 × 𝐴 → 𝑆; T 𝑠𝑡, 𝑎 𝑡 = 𝑃 𝑠𝑡+1 𝑠𝑡, 𝑎 𝑡
• Reward function r: 𝑆 × 𝐴 → ℝ; we write 𝑟 𝑠𝑡, 𝑎 𝑡 = 𝑟𝑡
• Initial state distribution 𝜌0; 𝜌0 𝑠 = 𝑃( 𝑠0 = 𝑠)
• Temporal discount factor 𝛾
“Markov” because it assumes:
𝑃 𝑠𝑡+1 𝑠𝑡, 𝑎 𝑡, 𝑠𝑡−1, 𝑎 𝑡−1, … , 𝑠0
= 𝑃 𝑠𝑡+1 𝑠𝑡, 𝑎 𝑡 = T(𝑠𝑡, 𝑎 𝑡)
Department of Computer
Science and Engineering
IIT Kharagpur
Some more definitions
• Policy 𝜋: 𝑆 → 𝐴: A function that predicts actions for a given state
• Trajectory 𝜏: A sequence of (𝑠𝑡, 𝑎 𝑡) tuples that describe an episode of experiences
of an agent as it executes a policy.
𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
Department of Computer
Science and Engineering
IIT Kharagpur
Approaches to Imitation Learning
Broad Categories
Imitation Learning
Learning from a
dataset of expert
demonstrations
Behavioral
Cloning
Apprenticeship
Learning
Active learning
with an expert
Department of Computer
Science and Engineering
IIT Kharagpur
Learning from a Dataset of
Expert Demonstrations
Department of Computer
Science and Engineering
IIT Kharagpur
Problem Definition
• Given: a dataset of trajectories demonstrated by an expert:
where each trajectory is a sequence of states and actions:
• Goal: Find a policy 𝜋∗
that achieves “expert-like performance”
𝜏 𝑖 𝑖=1
𝑁
𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
Department of Computer
Science and Engineering
IIT Kharagpur
Behavioral Cloning
Supervised learning of a mapping from states to the expert’s actions in those states
Model
𝑥1
𝑥2
.
.
.
𝑥 𝑛
state: 𝑥
𝑎
𝑎: expert action
−
statistical
divergence
Loss
Minimize this
w.r.t. model parameters
expert
Department of Computer
Science and Engineering
IIT KharagpurPros and Cons of Behavioral
Cloning
• Advantages:
• Simplicity!
• Drawbacks:
• Fails to work well with limited data
• Assumes that observations are i.i.d. and learn to fit single time step decisions
This leads to the problem of compounding error due to covariate shift
Department of Computer
Science and Engineering
IIT Kharagpur
Apprenticeship Learning
Department of Computer
Science and Engineering
IIT KharagpurReinforcement
Learning
Reinforcement Learning
refers to learning through
trial and error using
feedback from the
environment.
Action
Reward,
New State
Environment
Agent
Department of Computer
Science and Engineering
IIT Kharagpur
Goal of RL
Find a policy 𝜋∗that
maximizes the expectation of
the reward function 𝑅 𝜏
over trajectories 𝜏
𝜋∗
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝜋 Ε 𝜏[𝑅(𝜏)]
Reward of a trajectory 𝑅 𝜏 is a
function of all the rewards
received in a trajectory
e.g. 𝑅 𝜏 = 𝑡 𝑟𝑡 , 𝑅 𝜏 = 𝑡 𝛾 𝑡 𝑟𝑡
Department of Computer
Science and Engineering
IIT Kharagpur
Apprenticeship Learning
1. Inverse Reinforcement Learning (IRL): Use the dataset of expert-
demonstrations to uncover the reward function that the expert is
trying to optimize.
• This reward function is expected to succinctly encode the expert’s behavior…
2. Reinforcement Learning (IRL): Learn the optimal policy for this
recovered reward function using RL.
expert
demonstrations
IRL
reward
function
RL
optimum
policy
Department of Computer
Science and Engineering
IIT KharagpurPros and Cons of Apprenticeship
Learning
• Advantages:
• Does not take single time-step decisions and hence compounding error is not a
problem, unlike behavioral cloning
• Drawbacks:
• IRL is a computationally expensive algorithm because it needs RL to run in an
inside loop
• Scalability issues in large environment
• Agent needs to act in the environment during learning – this may be unsafe in
risk-sensitive applications
Department of Computer
Science and Engineering
IIT Kharagpur
Active Learning
Department of Computer
Science and Engineering
IIT Kharagpur
Active Learning
In Active Learning the agent
is able to query the expert
for an optimal action in any
given state and use these
active samples to improve its
policy
state
agent
confidence
High Low
Agent takes
action
Agent queries
expert
action
Agent
takes
actionAgent rectifies
policy
Department of Computer
Science and Engineering
IIT Kharagpur
Workflow of Active Learning
Train the agent by
behavioral cloning
Deploy the agent
in the real world
in presence of an
expert
Agent queries the
expert whenever
it is in doubt and
rectifies itself
Department of Computer
Science and Engineering
IIT Kharagpur
Pros and Cons of Active Learning
• Advantages:
• Safe during both training and testing
• Drawbacks:
• Getting robust confidence estimates is tough
• Requires longer supervision of the expert
Department of Computer
Science and Engineering
IIT Kharagpur
Issue of Safety
Department of Computer
Science and Engineering
IIT Kharagpur
Types of Safety
Safety during
training
Safety after
deployment
Department of Computer
Science and Engineering
IIT KharagpurDifferent Approaches to Ensuring
Safety
• Vigilance during exploration
• External Knowledge
• Prior knowledge
• Expert demonstration
• Teacher advice
• Risk-directed exploration
• Engineering the optimization criterion
• Worst case criteria
• Risk-sensitive criteria
• Constrained criteria
Department of Computer
Science and Engineering
IIT Kharagpur
Case study on how to make
an existing algorithm safe
Department of Computer
Science and Engineering
IIT Kharagpur
GAIL: Generative Adversarial Imitation
Learning
Problem of heavy tail
Department of Computer
Science and Engineering
IIT Kharagpur
RAIL: Risk-Averse Imitation Learning
Santara et al. 2017. Accepted at Deep Reinforcement Learning Symposium at NIPS 2017
CVaR of trajectory risk
Department of Computer
Science and Engineering
IIT Kharagpur
Results
Department of Computer
Science and Engineering
IIT Kharagpur
Any Questions, Please 
Scan me to give
Anirban feedback
Department of Computer
Science and Engineering
IIT Kharagpur
Thank You

More Related Content

What's hot

Model based rl
Model based rlModel based rl
Model based rlSeolhokim
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
 
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learningJie-Han Chen
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networksParveenMalik18
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Empirical Mode Decomposition of the signal
Empirical Mode Decomposition of the signal Empirical Mode Decomposition of the signal
Empirical Mode Decomposition of the signal Harshal Chaudhari
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningJungyeol
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: ClusteringDeepak George
 
Autonomous Driving and Reinforcement Learning - an Introduction
Autonomous Driving and Reinforcement Learning - an IntroductionAutonomous Driving and Reinforcement Learning - an Introduction
Autonomous Driving and Reinforcement Learning - an IntroductionMichael Bosello
 

What's hot (20)

Model based rl
Model based rlModel based rl
Model based rl
 
kalman filtering "From Basics to unscented Kaman filter"
 kalman filtering "From Basics to unscented Kaman filter" kalman filtering "From Basics to unscented Kaman filter"
kalman filtering "From Basics to unscented Kaman filter"
 
FSM and ASM
FSM and ASMFSM and ASM
FSM and ASM
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Kalman Filter | Statistics
Kalman Filter | StatisticsKalman Filter | Statistics
Kalman Filter | Statistics
 
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
 
Artificial Intelligent Agents
Artificial Intelligent AgentsArtificial Intelligent Agents
Artificial Intelligent Agents
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Empirical Mode Decomposition of the signal
Empirical Mode Decomposition of the signal Empirical Mode Decomposition of the signal
Empirical Mode Decomposition of the signal
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
 
Autonomous Driving and Reinforcement Learning - an Introduction
Autonomous Driving and Reinforcement Learning - an IntroductionAutonomous Driving and Reinforcement Learning - an Introduction
Autonomous Driving and Reinforcement Learning - an Introduction
 

Similar to Imitation Learning

An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
 
RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...
RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...
RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...Anirban Santara
 
What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018Gourab Nath
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningArithmer Inc.
 
Machine Learning Model Evaluation Methods
Machine Learning Model Evaluation MethodsMachine Learning Model Evaluation Methods
Machine Learning Model Evaluation MethodsPyingkodi Maran
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachLorenzo Cesaretti
 
IRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET Journal
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaLuca Marignati
 
Imitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCSImitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCSPreferred Networks
 
Lecture on AI and Machine Learning
Lecture on AI and Machine LearningLecture on AI and Machine Learning
Lecture on AI and Machine LearningXiaonan Wang
 
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...IJERA Editor
 
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...IJERA Editor
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptxSeungeon Baek
 

Similar to Imitation Learning (20)

An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...
RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...
RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...
 
What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learning
 
Machine Learning Model Evaluation Methods
Machine Learning Model Evaluation MethodsMachine Learning Model Evaluation Methods
Machine Learning Model Evaluation Methods
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Operations Research
Operations ResearchOperations Research
Operations Research
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
 
IRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial Intelligence
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
03_Optimization (1).pptx
03_Optimization (1).pptx03_Optimization (1).pptx
03_Optimization (1).pptx
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in Informatica
 
Imitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCSImitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCS
 
Lecture on AI and Machine Learning
Lecture on AI and Machine LearningLecture on AI and Machine Learning
Lecture on AI and Machine Learning
 
module_1_ppt.pdf
module_1_ppt.pdfmodule_1_ppt.pdf
module_1_ppt.pdf
 
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
 
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptx
 

Recently uploaded

PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 

Recently uploaded (20)

PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

Imitation Learning

  • 1. Department of Computer Science and Engineering IIT Kharagpur Imitation Learning: Learning to Act like Humans from Humans SSN College of Engineering – Faculty Development Program Talk 10:45-12:15 IST, 25 Nov 2017 Anirban Santara santara.github.io
  • 2. Department of Computer Science and Engineering IIT Kharagpur About me Anirban Santara Google India Ph.D. Fellow at IIT Kharagpur (2015-Present) Graduate Research Intern at Intel Labs for Autonomous Driving (2017-Present) B.Tech. in Electronics and Electrical Communication Engineering from IIT Kharagpur in 2015
  • 3. Department of Computer Science and Engineering IIT Kharagpur Contents 1. Building the motivation 2. Problem definition and Different Approaches to Solution 3. Issues of Safety and Reliability
  • 4. Department of Computer Science and Engineering IIT Kharagpur Description of the Imitation Learning Problem
  • 5. Department of Computer Science and Engineering IIT Kharagpur Imitation Learning Imitation Learning techniques aim to mimic human behavior at a given task1 1 Hussein, Ahmed, et al. "Imitation Learning: A Survey of Learning Methods." ACM Computing Surveys (CSUR) 50.2 (2017): 21. Image Source: GRASP lab - University of Pennsylvania
  • 6. Department of Computer Science and Engineering IIT Kharagpur Why should you care? • Imitation learning methods are rooted in neuro-science and form an important part of learning in humans • Makes it possible to teach robots complex tasks with minimal expert knowledge of the tasks • No need for explicit programming or task-specific reward function design • Its high time! • Modern sensors are able to collect and transmit high volumes of data at high speed • High performance computing is cheaper, more capable and ubiquitous than ever • Virtual Reality systems – that are considered the best portal of human-machine interaction – are widely available
  • 7. Department of Computer Science and Engineering IIT Kharagpur Example Application Areas
  • 8. Department of Computer Science and Engineering IIT Kharagpur Autonomous Driving No more accidents due to human error. No more traffic jams.
  • 9. Department of Computer Science and Engineering IIT Kharagpur Robotic Surgery Complex Actions in Critical Situations – Accurate. Every time.
  • 10. Department of Computer Science and Engineering IIT Kharagpur Industrial Automation Efficiency. Precise Quality Control. Safety.
  • 11. Department of Computer Science and Engineering IIT Kharagpur Assistive Robotics Elderly Care. Rehabilitation. Special Needs.
  • 12. Department of Computer Science and Engineering IIT Kharagpur Conversational Agents Assistance. Recommendation. Therapy.
  • 13. Department of Computer Science and Engineering IIT Kharagpur Approaches to Solution
  • 14. Department of Computer Science and Engineering IIT Kharagpur A quick primer on Machine Learning Reference application – Driving a Racing Car State variables (X): • Position in track • Distance from track edges along different directions • Direction of heading • Current speed Action Variables (Y): • Steering • Acceleration • Brake
  • 15. Department of Computer Science and Engineering IIT Kharagpur Comparison of ML paradigms Supervised Learning • Would require training examples in the form: { 𝑋𝑖, 𝑌𝑖 }𝑖=1 𝑁 • Where, 𝑌𝑖 are true/correct actions that must be taken in state 𝑋𝑖 Unsupervised Learning • Works only on with the input state information 𝑋𝑖 • Does not use any kind of feedback from the environment regarding performance of the agent Reinforcement Learning • Requires feedback from the environment in the form of reward signals • Reward signals might be sparse and delayed • But it should indicate the quality of actions being taken by the agent in different states e.g. +1 if the car makes progress, -1 if it comes to a halt, -10 if it bumps into an obstacle, 100 if it finishes the race
  • 16. Department of Computer Science and Engineering IIT Kharagpur Problem Setting Our Agent has to achieve its goal by taking a sequence of actions in an environment whose states change in response to the agent’s actions. ActionNew State Environment Agent
  • 17. Department of Computer Science and Engineering IIT Kharagpur Mathematical Formulation Markov Decision Process (MDP) Imitation Learning problems are often specified in terms of a Markov Decision Process (MDP). An MDP is defined as ℳ = (𝑆, 𝐴, 𝑇, 𝑟, 𝜌0, 𝛾) • State Space 𝑆: Set of all possible states/configurations of the environment • Action Space 𝐴: Set of all possible actions • Transition Probability 𝑇: 𝑆 × 𝐴 → 𝑆; T 𝑠𝑡, 𝑎 𝑡 = 𝑃 𝑠𝑡+1 𝑠𝑡, 𝑎 𝑡 • Reward function r: 𝑆 × 𝐴 → ℝ; we write 𝑟 𝑠𝑡, 𝑎 𝑡 = 𝑟𝑡 • Initial state distribution 𝜌0; 𝜌0 𝑠 = 𝑃( 𝑠0 = 𝑠) • Temporal discount factor 𝛾 “Markov” because it assumes: 𝑃 𝑠𝑡+1 𝑠𝑡, 𝑎 𝑡, 𝑠𝑡−1, 𝑎 𝑡−1, … , 𝑠0 = 𝑃 𝑠𝑡+1 𝑠𝑡, 𝑎 𝑡 = T(𝑠𝑡, 𝑎 𝑡)
  • 18. Department of Computer Science and Engineering IIT Kharagpur Some more definitions • Policy 𝜋: 𝑆 → 𝐴: A function that predicts actions for a given state • Trajectory 𝜏: A sequence of (𝑠𝑡, 𝑎 𝑡) tuples that describe an episode of experiences of an agent as it executes a policy. 𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
  • 19. Department of Computer Science and Engineering IIT Kharagpur Approaches to Imitation Learning Broad Categories Imitation Learning Learning from a dataset of expert demonstrations Behavioral Cloning Apprenticeship Learning Active learning with an expert
  • 20. Department of Computer Science and Engineering IIT Kharagpur Learning from a Dataset of Expert Demonstrations
  • 21. Department of Computer Science and Engineering IIT Kharagpur Problem Definition • Given: a dataset of trajectories demonstrated by an expert: where each trajectory is a sequence of states and actions: • Goal: Find a policy 𝜋∗ that achieves “expert-like performance” 𝜏 𝑖 𝑖=1 𝑁 𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
  • 22. Department of Computer Science and Engineering IIT Kharagpur Behavioral Cloning Supervised learning of a mapping from states to the expert’s actions in those states Model 𝑥1 𝑥2 . . . 𝑥 𝑛 state: 𝑥 𝑎 𝑎: expert action − statistical divergence Loss Minimize this w.r.t. model parameters expert
  • 23. Department of Computer Science and Engineering IIT KharagpurPros and Cons of Behavioral Cloning • Advantages: • Simplicity! • Drawbacks: • Fails to work well with limited data • Assumes that observations are i.i.d. and learn to fit single time step decisions This leads to the problem of compounding error due to covariate shift
  • 24. Department of Computer Science and Engineering IIT Kharagpur Apprenticeship Learning
  • 25. Department of Computer Science and Engineering IIT KharagpurReinforcement Learning Reinforcement Learning refers to learning through trial and error using feedback from the environment. Action Reward, New State Environment Agent
  • 26. Department of Computer Science and Engineering IIT Kharagpur Goal of RL Find a policy 𝜋∗that maximizes the expectation of the reward function 𝑅 𝜏 over trajectories 𝜏 𝜋∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝜋 Ε 𝜏[𝑅(𝜏)] Reward of a trajectory 𝑅 𝜏 is a function of all the rewards received in a trajectory e.g. 𝑅 𝜏 = 𝑡 𝑟𝑡 , 𝑅 𝜏 = 𝑡 𝛾 𝑡 𝑟𝑡
  • 27. Department of Computer Science and Engineering IIT Kharagpur Apprenticeship Learning 1. Inverse Reinforcement Learning (IRL): Use the dataset of expert- demonstrations to uncover the reward function that the expert is trying to optimize. • This reward function is expected to succinctly encode the expert’s behavior… 2. Reinforcement Learning (IRL): Learn the optimal policy for this recovered reward function using RL. expert demonstrations IRL reward function RL optimum policy
  • 28. Department of Computer Science and Engineering IIT KharagpurPros and Cons of Apprenticeship Learning • Advantages: • Does not take single time-step decisions and hence compounding error is not a problem, unlike behavioral cloning • Drawbacks: • IRL is a computationally expensive algorithm because it needs RL to run in an inside loop • Scalability issues in large environment • Agent needs to act in the environment during learning – this may be unsafe in risk-sensitive applications
  • 29. Department of Computer Science and Engineering IIT Kharagpur Active Learning
  • 30. Department of Computer Science and Engineering IIT Kharagpur Active Learning In Active Learning the agent is able to query the expert for an optimal action in any given state and use these active samples to improve its policy state agent confidence High Low Agent takes action Agent queries expert action Agent takes actionAgent rectifies policy
  • 31. Department of Computer Science and Engineering IIT Kharagpur Workflow of Active Learning Train the agent by behavioral cloning Deploy the agent in the real world in presence of an expert Agent queries the expert whenever it is in doubt and rectifies itself
  • 32. Department of Computer Science and Engineering IIT Kharagpur Pros and Cons of Active Learning • Advantages: • Safe during both training and testing • Drawbacks: • Getting robust confidence estimates is tough • Requires longer supervision of the expert
  • 33. Department of Computer Science and Engineering IIT Kharagpur Issue of Safety
  • 34. Department of Computer Science and Engineering IIT Kharagpur Types of Safety Safety during training Safety after deployment
  • 35. Department of Computer Science and Engineering IIT KharagpurDifferent Approaches to Ensuring Safety • Vigilance during exploration • External Knowledge • Prior knowledge • Expert demonstration • Teacher advice • Risk-directed exploration • Engineering the optimization criterion • Worst case criteria • Risk-sensitive criteria • Constrained criteria
  • 36. Department of Computer Science and Engineering IIT Kharagpur Case study on how to make an existing algorithm safe
  • 37. Department of Computer Science and Engineering IIT Kharagpur GAIL: Generative Adversarial Imitation Learning Problem of heavy tail
  • 38. Department of Computer Science and Engineering IIT Kharagpur RAIL: Risk-Averse Imitation Learning Santara et al. 2017. Accepted at Deep Reinforcement Learning Symposium at NIPS 2017 CVaR of trajectory risk
  • 39. Department of Computer Science and Engineering IIT Kharagpur Results
  • 40. Department of Computer Science and Engineering IIT Kharagpur Any Questions, Please  Scan me to give Anirban feedback
  • 41. Department of Computer Science and Engineering IIT Kharagpur Thank You