Scalable Robust Learning from
Demonstration with Leveraged
Deep Neural Networks
Sungjoon Choi, Kyungjae Lee, and Songhwai Oh
Seoul National University
Contents
2Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
 Leverage Optimization
 Leveraged Gaussian Process
 Leveraged Deep Neural Network
 Conclusion
 Learning from Demonstration
 Experiment
 Preliminaries
Learning from Demonstration
3
Human
Expert
Learning from
Demonstration
Execute in Unseen
Environments
Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Existing Limitations
4Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
We want to incorporate demonstrations with
mixed qualities without labeling.
Most LfD methods assume the optimality of actions.
Existing Limitations
5Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
We want to incorporate large-scale demonstrations.
Some LfD methods are not be scalable.
Leveraged Gaussian Process
6Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Choi et.al.,"Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation”, ICRA 2015
For example,
𝛾 = 1.0 𝛾 = 0.8 𝛾 = −0.5 𝛾 = 0
𝜋 𝜉𝑗
𝑗 = 1, … , 𝑁
Gaussian
process
𝜋𝑖 𝜉𝑖𝑗
𝑖 = 1, … , 𝑀
𝑗 = 1, … , 𝑁
𝛾𝑖
where the correlation between 𝜋𝑖 and 𝜋𝑖′ is
defined by cos
𝜋
2
𝛾𝑖 − 𝛾𝑖′ (𝛾𝑖 is between -1 and
+1).
Leveraged Gaussian process
Leveraged Gaussian Process
7Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Positive definiteness can be shown with the Bochner’s theorem.
(An isotropic kernel function is PSD iff. its Fourier coefficients are non-negative.)
Leveraged Kernel Function Leverage Gaussian
Process
However, Gaussian process regression is not good for large training data.
Choi et.al.,"Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation”, ICRA 2015
Leverage Optimization
8Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Dataset with
Mixed Qualities
Choi et.al.,"Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization”, ICRA 2016
Leverage Optimization
9Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Dataset with
Mixed Qualities
leverage: 1 leverage: 0.9 leverage: 0.7 leverage: -1 leverage: 0
Leverage Optimization
Choi et.al.,"Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization”, ICRA 2016
Leverage Optimization
10Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Choi et.al.,"Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization”, ICRA 2016
The key intuition behind the leverage optimization is that we cast
the leverage optimization problem into a model selection problem
in Gaussian process regression.
However, the number of leverage parameters is equivalent to the
number of training data.
To handle this issue, a sparse constrained leverage optimization
where we assume that the majority of leverage parameters are +1 is
presented.
where −L ⋅ is the negative log likelihood and 𝛾 = 𝛾 − 1.
Resulting optimization problem becomes:
Leveraged Neural Network
11
• We propose a leveraged deep neural network by proposing a
leveraged cost function by interpreting the objective function
of the leveraged Gaussian processes using the representer
theorem.
Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Representer theorem +
(Assume 𝜸𝒊 is either −1 or +1.)
Empirical risk term
Leveraged Neural Network
12Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Input New TargetLeverage
Parameterized
Estimator Regularizer
Estimator
Leveraged
Cost Function:
New
Target
Experiment-setting
13Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Input: 𝑑 𝐿
𝐵
, 𝑑 𝐿
𝐹
, 𝑑 𝐶
𝐵
, 𝑑 𝐶
𝐹
, 𝑑 𝑅
𝐵
, 𝑑 𝑅
𝐹
, 𝑑 𝑑𝑒𝑣 ∈ 𝑅7
Output: 𝜃 𝑑𝑒𝑣 ∈ 𝑅
*Two hidden layers (512 units) with tanh activation functions.
Experiment-data collection
14Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Experiment-leverage optimization
15Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Dataset
with
Mixed
Qualities
Safe:Inexp:Suicidal = 70:15:15
Experiment-driving results
16Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Gaussian process
without optimization
Leveraged deep neural
network (20,000 demos)
Leveraged Gaussian
process (5,000 demos)
Safe:Inexp:Suicidal = 70:15:15
Conclusion
17Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
 Robust and scalable learning from demonstration is presented.
 Robustness comes from the leverage optimization [1].
 Scalability comes from the leveraged deep neural network
using the proposed leveraged cost function.
 The proposed method is successfully applied to a track driving
task where the demonstrations are collected from multiple
modes with different proficiencies.
 Further work will focus on incorporating the uncertainty
information of a model prediction using a Bayesian network
where the initial results can be found in [2].
[1] Choi et.al.,"Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization”, ICRA 2016
[2] Choi et. al. ‘Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling’, ArXiv1709.02249, 2017
18Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Thank you for your attention.
Contact information
(sungjoon.choi@cpslab.snu.ac.kr)
19Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
Choi et. al. ‘Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with
Sampling-Free Variance Modeling’, ArXiv 1709.02249, 2017
Uncertainty-Aware LfD

IROS 2017 Slides

  • 1.
    Scalable Robust Learningfrom Demonstration with Leveraged Deep Neural Networks Sungjoon Choi, Kyungjae Lee, and Songhwai Oh Seoul National University
  • 2.
    Contents 2Scalable Robust Learningfrom Demonstration with Leveraged Deep Neural Networks  Leverage Optimization  Leveraged Gaussian Process  Leveraged Deep Neural Network  Conclusion  Learning from Demonstration  Experiment  Preliminaries
  • 3.
    Learning from Demonstration 3 Human Expert Learningfrom Demonstration Execute in Unseen Environments Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks
  • 4.
    Existing Limitations 4Scalable RobustLearning from Demonstration with Leveraged Deep Neural Networks We want to incorporate demonstrations with mixed qualities without labeling. Most LfD methods assume the optimality of actions.
  • 5.
    Existing Limitations 5Scalable RobustLearning from Demonstration with Leveraged Deep Neural Networks We want to incorporate large-scale demonstrations. Some LfD methods are not be scalable.
  • 6.
    Leveraged Gaussian Process 6ScalableRobust Learning from Demonstration with Leveraged Deep Neural Networks Choi et.al.,"Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation”, ICRA 2015 For example, 𝛾 = 1.0 𝛾 = 0.8 𝛾 = −0.5 𝛾 = 0 𝜋 𝜉𝑗 𝑗 = 1, … , 𝑁 Gaussian process 𝜋𝑖 𝜉𝑖𝑗 𝑖 = 1, … , 𝑀 𝑗 = 1, … , 𝑁 𝛾𝑖 where the correlation between 𝜋𝑖 and 𝜋𝑖′ is defined by cos 𝜋 2 𝛾𝑖 − 𝛾𝑖′ (𝛾𝑖 is between -1 and +1). Leveraged Gaussian process
  • 7.
    Leveraged Gaussian Process 7ScalableRobust Learning from Demonstration with Leveraged Deep Neural Networks Positive definiteness can be shown with the Bochner’s theorem. (An isotropic kernel function is PSD iff. its Fourier coefficients are non-negative.) Leveraged Kernel Function Leverage Gaussian Process However, Gaussian process regression is not good for large training data. Choi et.al.,"Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation”, ICRA 2015
  • 8.
    Leverage Optimization 8Scalable RobustLearning from Demonstration with Leveraged Deep Neural Networks Dataset with Mixed Qualities Choi et.al.,"Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization”, ICRA 2016
  • 9.
    Leverage Optimization 9Scalable RobustLearning from Demonstration with Leveraged Deep Neural Networks Dataset with Mixed Qualities leverage: 1 leverage: 0.9 leverage: 0.7 leverage: -1 leverage: 0 Leverage Optimization Choi et.al.,"Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization”, ICRA 2016
  • 10.
    Leverage Optimization 10Scalable RobustLearning from Demonstration with Leveraged Deep Neural Networks Choi et.al.,"Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization”, ICRA 2016 The key intuition behind the leverage optimization is that we cast the leverage optimization problem into a model selection problem in Gaussian process regression. However, the number of leverage parameters is equivalent to the number of training data. To handle this issue, a sparse constrained leverage optimization where we assume that the majority of leverage parameters are +1 is presented. where −L ⋅ is the negative log likelihood and 𝛾 = 𝛾 − 1. Resulting optimization problem becomes:
  • 11.
    Leveraged Neural Network 11 •We propose a leveraged deep neural network by proposing a leveraged cost function by interpreting the objective function of the leveraged Gaussian processes using the representer theorem. Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks Representer theorem + (Assume 𝜸𝒊 is either −1 or +1.) Empirical risk term
  • 12.
    Leveraged Neural Network 12ScalableRobust Learning from Demonstration with Leveraged Deep Neural Networks Input New TargetLeverage Parameterized Estimator Regularizer Estimator Leveraged Cost Function: New Target
  • 13.
    Experiment-setting 13Scalable Robust Learningfrom Demonstration with Leveraged Deep Neural Networks Input: 𝑑 𝐿 𝐵 , 𝑑 𝐿 𝐹 , 𝑑 𝐶 𝐵 , 𝑑 𝐶 𝐹 , 𝑑 𝑅 𝐵 , 𝑑 𝑅 𝐹 , 𝑑 𝑑𝑒𝑣 ∈ 𝑅7 Output: 𝜃 𝑑𝑒𝑣 ∈ 𝑅 *Two hidden layers (512 units) with tanh activation functions.
  • 14.
    Experiment-data collection 14Scalable RobustLearning from Demonstration with Leveraged Deep Neural Networks
  • 15.
    Experiment-leverage optimization 15Scalable RobustLearning from Demonstration with Leveraged Deep Neural Networks Dataset with Mixed Qualities Safe:Inexp:Suicidal = 70:15:15
  • 16.
    Experiment-driving results 16Scalable RobustLearning from Demonstration with Leveraged Deep Neural Networks Gaussian process without optimization Leveraged deep neural network (20,000 demos) Leveraged Gaussian process (5,000 demos) Safe:Inexp:Suicidal = 70:15:15
  • 17.
    Conclusion 17Scalable Robust Learningfrom Demonstration with Leveraged Deep Neural Networks  Robust and scalable learning from demonstration is presented.  Robustness comes from the leverage optimization [1].  Scalability comes from the leveraged deep neural network using the proposed leveraged cost function.  The proposed method is successfully applied to a track driving task where the demonstrations are collected from multiple modes with different proficiencies.  Further work will focus on incorporating the uncertainty information of a model prediction using a Bayesian network where the initial results can be found in [2]. [1] Choi et.al.,"Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization”, ICRA 2016 [2] Choi et. al. ‘Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling’, ArXiv1709.02249, 2017
  • 18.
    18Scalable Robust Learningfrom Demonstration with Leveraged Deep Neural Networks Thank you for your attention. Contact information (sungjoon.choi@cpslab.snu.ac.kr)
  • 19.
    19Scalable Robust Learningfrom Demonstration with Leveraged Deep Neural Networks Choi et. al. ‘Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling’, ArXiv 1709.02249, 2017 Uncertainty-Aware LfD

Editor's Notes

  • #6 1. Fei-Fei Li : ImageNet 2. Nicholas Roy: Natural Language Understanding