Robot Learning with Structured Knowledge And Richer Sensing

Robot Learning with Structured Knowledge
And Richer Sensing
Akihiko Yamaguchi
Robotics Institute, Carnegie Mellon University

Manipulations in Everyday Activities
Folding clothes
Cleaning
Cooking
Bathing
Dressing
…
7
Japanese way of
folding T-shirts
https://youtu.be/b5A
WQ5aBjgE
Chinese
cooking skills
https://youtu.be
/PFGGTPPNdRQ

Dynamic Programming& ReinforcementLearning
8
• Tasty / Awful
• “I am satisfied ”
• ….
Dynamic Programming when {Fk} are given
Reinforcement Learningwhen {Fk} are unknown

https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS

Yamaguchiet al. "DCOB: Action space for reinforcementlearning of high DoF robots", AutonomousRobots, 2013

Deep Reinforcement Learning
Deep learning: With big data, NN can learn any I/O
mapping with any precision. We don't have to care
about how large the state space is. It can directly
handle image as an input without designing features.
Deep RL: Using deep NN to represent policy, dynamical
models, value functions, etc. Deep RL can handle large
state space with big data.
E.g. Atali, Google (S Levine)'s learning visual servoing.
11

12
(T-L) Learning to play Atari games
by Google DeepMind, Mnih et al.
2015
https://youtu.be/cjpEIotvwFY
(T-R) DeepMPC Robotic
Experiments - PR2 cuts food, Lenz
et al. 2015
https://youtu.be/BwA90MmkvPU
(B-L) Learning to grasp from 50K
Tries, Pinto et al. 2016
https://youtu.be/oSqHc0nLkm8
(B-R) Learning hand-eye
coordination for robotic grasping,
Levine et al. 2017
https://youtu.be/l8zKZLqkfII

Can Deep RL solve RL problems in general?
Maybe YES
Is that intelligentthat we expect to robots?
Maybe NO
Learning grasping: 50,000 samples (Pintoet al. 2016),
800,000samples (Levine et al. 2017)
How many samples are necessary to learn cooking
sushi?
Strategy to designing a problem is unclear
Learning with less samples is unclear
13

Intelligent Robot
English proverb says:
"A word to the wise is enough."
"Many words to a fool, half a word to the wise."
In Japanese:一を知って十を知る
Robot version:
Many practices to a fool robot, half a practice to the
intelligent robot.
14

How do we measure intelligence of robots?
Adaptation ability
Generalization ability
Scalability
15 From talk by Leslie Kaelbling inICRA’16

Key components to create intelligent robots
???
???
???
???
16

Key components to create intelligent robots
Library of skills
Structured knowledge
Learning and reasoning methods
Richer sensing and general hardware
18

My Work (Introduced today)
Deformable object manipulation (liquids, powders,
vegetables and fruits, etc.)
Representing behaviors with a skill library;
Verification in PR2 and Baxter pouring
Model-based RL with structured knowledge;
Verified in simulation pouring
Richer sensing helps learning: Liquid flow
perception, FingerVision
19

Library of skills is essential
20

http://reflectionsintheword.files.wordpress.com/
2012/08/pouring-water-into-glass.jpg
http://schools.graniteschools.org/
edtech-canderson/files/2013/01/
heinz-ketchup-old-bottle.jpg
http://old.post-gazette.com/images2/
20021213hosqueeze_230.jpg
http://img.diytrade.com/cdimg/1352823/17809917/
0/1292834033/shampoo_bottle_bodywash_bottle.jpg
http://www.nescafe.com/
upload/golden_roast_f_711.png

Pouring Behavior with Skill Library
Skill library
flow ctrl (tip, shake, …), grasp, move arm, …
 State machines (structure, feedback control)
Planning methods
grasp, re-grasp, pouring locations,
feasible trajectories, …
 Optimization-based approach
Learning methods
Skill selection  Table, Softmax
Parameter adjustment
(e.g. shake axis)  Optimization (CMA-ES)
Improve plan quality  Improve value functions27

Sharing Knowledge Among Robots
28
The same implementation
worked on PR2 and Baxter
PR2 and Baxter:
Diff: Kinematics, grippers
Same: Arm DoF, sensors
Sharable knowledge:
Skills
Behavior structure
Not sharable:
Policy parameters

Achieved and NOT Achieved
Achieved:
Generalization of grasping, moving container, and
pouring skills
 over container shapes
 over initial container poses
 over different target amounts
Adaptation of pouring skills
 to new material types & container shapes
NOT achieved:
Generalization of pouring skills
 over material types & container shapes
29

Reinforcement learning for generalization
30

Reinforcement Learning in Pouring
Components of pouring behavior:
Skill library: can be general
Behavior structure: can be general
Selection of skill and skill parameters: situation specific
 Planning (dynamic programming) is necessary
Dynamics are partially unknown
 Reinforcement Learning Problem
31

Reinforcement Learning
32
[ReinforcementLearning]
[DirectPolicySearch] [ValueFunction-based]
[Model-based]
[Model-free]
RL RL SL
[DynamicProgramming][Optimization]
Planning
depth
Learning
complexity
[Policy] [ValueFunctions] [ForwardModels]What is
learned
0 1 N

33
[Direct PolicySearch]
[Value Function-based]
[Model-based]

Model-free is tend to obtain better performance
34
[Kober,Peters,2011] [Kormushev,2010]

Model-free is robust in POMDP
35
Yamaguchiet al. "DCOB: Action space for reinforcementlearning of high DoF robots", AutonomousRobots, 2013
POMDP:
Partially Observable
Markov Decision
Process

Model-based is suffered from simulation biases
36
Simulation bias: When forward models are inaccurate (usual when
learning models), integrating the forward models causes a rapid
increase of future state estimation errors
cf. [Atkeson,Schaal,1997b][Kober,Peters,2013]

Model-based is good at generalization
37
input
output
hidden
－ u
update
FK ANN
Learning inverse kinematics of android face
[Magtanong, Yamaguchi, et al. 2012]

Model-based is good at sharing / reusing
learned components
38
Forward models are sharable / reusable
Analyticalmodelscan be combined

Model-based is flexible to reward changes
39

Our Approach
Model-based reinforcement learning
How to deal with simulation biases?
Do not try to learn dx/dt = F(x,u) (dt: small like xx ms)
Learn (sub)task-level dynamics
 Parameters  F_grasp  Grasp result
 Parameters  F_flow_ctrl  Flow ctrl result
Use stochastic models
 Gaussian  F  Gaussian
Use stochastic dynamic programming
 E.g. Stochastic (Differential) Dynamic Programming
How to work with a skill library?
40

Model-based RL for Graph-Structured Dynamics
41
Learning Unknown
Dynamical Systems
with Stochastic
Neural Networks
Planning Actions
with Stochastic
Graph-DDP

42
Forward model can be:
• Dynamical system with/wo action
parameters
• Kinematics
• Featuredetection, Policy
parameterization
• Reward
• …
Bifurcation model can be:
• Possible different results of an action
• Skill selection
• Spatial decomposition of dynamics
• Spatial conversion, including
kinematics, feature detection, policy
parameterization, and rewards
• …
GraphDDP
Bifurcation primitive
[Yamaguchi andAtkeson, Humanoids2015, 2016]

43
GraphDDP
Skill selection
Possibledifferent results of an action
Reward
Spatial decomposition of dynamics

44
GraphDDP

45
 Tree DDP with multi-point search
GraphDDP
Graphstructure analysis

46 [Yamaguchi andAtkeson, ICRA 2016]
Stochastic Neural Networks

Pouring Simulation with OpenDE
48

49
Achieved GENERALIZATION
over material variation and
container shapes

Decomposition of dynamics and
richer sensing are useful in learning
50

Example-1: Flow in Pouring
Do robots need to perceive FLOW in pouring?
51
Skill parameters
Flow
Poured amount
Robot can learn skill
parameters to maximize
rewards (poured amount ==
target amount)
Considering decomposed
dynamics (flow as
intermediate state) makes
learning easier

How to Perceive Flow in Reality?
53

Example-2: Tactile Sensing in Manipulation
Tactilesensing is necessary in manipulation?
e.g. Google’s grasp learning: No tactile sensing; learning
visual servoing
What if grasping a container whose content is unknown?
What if external force is applied?
54

FingerVision: Vision-based Tactile Sensing
55
Multimodal tactile sensing
Force distribution
Proximity Vision
Slip / Deformation
Object pose, texture, shape
Low-cost and easy to
manufacture
Physically robust

Summary
Library of skills is essential
Skills and high-level behavior representations can be shared among robots
Consideringpros & cons of reinforcement learningapproaches is important
Model-free is tend to obtain better performance
Model-free is robust in POMDP
Model-based is suffered from simulation biases
Model-based is good at generalization
Model-based is good at sharing / reusing learned components
Model-based is flexible to reward changes
Model-based reinforcement learningmethod for graph-structured dynamical
systems is proposed
Learning forward models with stochastic neural networks
Planning with stochastic Graph-DDP (differential dynamicprogramming)
Generalization of pouring behavior over material types is achieved
Decomposition of dynamics and richer sensing is useful in learning
More work: http://akihikoy.net60

Robot Learning with Structured Knowledge And Richer Sensing

More Related Content

Similar to Robot Learning with Structured Knowledge And Richer Sensing

Recently uploaded

Robot Learning with Structured Knowledge And Richer Sensing