AI-based Robotic Manipulation
Akihiko Yamaguchi(*1)
*1 Grad Schl of Info Sci, Tohoku University
The latest version of these
slides are available at:
http://akihikoy.net/p/rsj18.pdf
Goal of This Talk
Introducing my work on AI-based Robotic
Manipulation
Discussing AI applications in robot industry
Target: Researchers and engineers who understand
basic theory of robotics and machine learning
2
input
output
hidden
- u
update
FK ANN
Can we build a robot that cooks like this?
If not, what are missing?
• AI/Software?
• Hardware?
• Sensors?
Everyday Manipulations are Difficult to Robots
Folding clothes
Cleaning
Cooking
Bathing
Dressing
…
5
Japanese way of
folding T-shirts
https://youtu.be/b5A
WQ5aBjgE
Chinese
cooking skills
https://youtu.be
/PFGGTPPNdRQ
What are the Difficulties?
Handling variations of:
Dynamics
✓ Non-rigid objects (Deformable, fragile, irregular shape, …)
✓ E.g. Vegetables, meats, liquids, cloths, …
✓ No good dynamical models
Situations
✓ Initial state, object properties, context, ...
✓ Each vegetable has different shape
Tasks
✓ Humans are doing many tasks everyday
Hardware capability: Robot << Human body
Feasible tasks of robots << Feasible tasks of humans
Humans have much better hands (& sensors) than robots
6
What is Artificial Intelligence?
Many different methods are called “AI”
Optimization
Machine Learning
✓ Supervised learning
✓ Unsupervised learning
✓ Reinforcement learning
Reasoning
✓ Search
✓ Motion planning
…
None of the above is AI
OR all programs are AI (“if x>0 then y else z” is a simplest AI)
7
AI can do many things
Many (AI) methods are
developed for many tasks
Why is AI Useful in Robotic Manipulation?
Handling variations:
Learning dynamics and tasks
Adapting to new situations
Generalizing the behavior to new situations and tasks
Machine Learning: Tools for adaptation
Reasoning: Tools for generalization
Optimization: Most fundamental tools
8
AI is a solution to handle variations
Hardware vs. AI (Software)
General robot arms: Available
6+ DoF arms
General robot hands: Not available
Existing dexterous robot hands do not cover the tasks that humans
can do
Good vision: Available
Good cameras
Good tactile sensors: Not available
No de-facto standard tactile sensors
9
General AI (for manipulation) research needs
General Hardware (arms, hands, sensors)
*But we don’t know what general hardware is
AI for Robotic Manipulation
10
Goal: Finding a policy to perform a given task
Dynamic Programming & Reinforcement Learning
11
Moving forward, Tasty,
“I am satisfied ☺”, ….
Dynamic Programming when {Fk} are given
Reinforcement Learning when {Fk} are unknown
Robotic manipulation is generally formulated
as a reinforcement learning problem
Hypothesis?
12
If we have a general reinforcement learning method,
robots can learn any (robotic manipulation) tasks
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
Deep Reinforcement Learning
Deep learning: With big data, NN can learn any I/O
mapping with any precision. We don't have to care
about how large the state space is. It can directly
handle image as an input without designing
features.
Deep RL: Using deep NN to represent policy,
dynamical models, or value functions, Deep RL can
handle large state space with big data.
15
Deep Reinforcement Learning
16
(T-L) Learning to play Atari
games by Google DeepMind,
Mnih et al. 2015
https://youtu.be/cjpEIotvwFY
(T-R) DeepMPC Robotic
Experiments - PR2 cuts food,
Lenz et al. 2015
https://youtu.be/BwA90MmkvPU
(B-L) Learning to grasp from
50K Tries, Pinto et al. 2016
https://youtu.be/oSqHc0nLkm8
(B-R) Learning hand-eye
coordination for robotic
grasping, Levine et al. 2017
https://youtu.be/l8zKZLqkfII
17
Rajeswaran, Kumar, Gupta, Schulman, Todorov, Levine: Learning Complex Dexterous Manipulation with Deep
Reinforcement Learning and Demonstrations
https://sites.google.com/view/deeprl-dexterous-manipulation
What is Good AI for Robotic Manipulation?
Task achievement (sum of rewards)
Learning speed (Number of samples)
Key axes to measure AI:
(from talk of Leslie Kaelbling in ICRA’16)
Adaptability
Generalization ability
Scalability
18
Experience
Performance
Learning curve is used to measure
the learning performance
What is a Promising Approach?
Deep (Reinforcement) Learning?
End-to-End Learning?
Imitation Learning?
…
19
No promising approach has been proposed yet
Baxter peels banana https://youtu.be/rEeixPBd3hc
Hypothesis: AI-based Robot Manipulation
Library of skills is essential
Having many alternative strategies
Reasoning and learning are core tools
Structured knowledge should be introduced
Skills, dynamics models, policies, …
Unified approach is the way to go
Hybrid of model-based and model-free
Multiple representations: continuous, primitive, symbolic
(and, unexplored stuff: e.g. Perception skills)
21
Library of skills is essential
22
http://reflectionsintheword.files.wordpress.com/
2012/08/pouring-water-into-glass.jpg
http://schools.graniteschools.org/
edtech-canderson/files/2013/01/
heinz-ketchup-old-bottle.jpg
http://old.post-gazette.com/images2/
20021213hosqueeze_230.jpg
http://img.diytrade.com/cdimg/1352823/17809917/
0/1292834033/shampoo_bottle_bodywash_bottle.jpg http://www.nescafe.com/
upload/golden_roast_f_711.png
Pouring Behavior with Skill Library
Skill library
flow ctrl (tip, shake, …), grasp, move arm, …
 State machines (structure, feedback control)
Planning methods
grasp, re-grasp, pouring locations,
feasible trajectories, …
 Optimization-based approach
Learning methods
Skill selection  Table, Softmax
Parameter adjustment
(e.g. shake axis)  Optimization (CMA-ES)
Improve plan quality  Improve value functions25 [Yamaguchi et al. IJHR 2015]
Sharing Knowledge Among Robots
30
The same implementation
worked on PR2 and Baxter
PR2 and Baxter:
Diff: Kinematics, grippers
Same: Arm DoF, sensors
Sharable knowledge:
Skills
Behavior structure
Not sharable:
Policy parameters
How Good is This AI?
Scalability
Framework: Applicable to many tasks (to be verified)
Skills: Reusable in many contexts (to be verified)
Adaptability
Adapted skill parameters and skill selections in a few episodes
Simple machine learning and optimization tools worked
Generalization ability
Generalized behaviors over traditional robotic manipulations
(e.g. grasping & moving containers)
Could not generalize over non-rigid-objects (liquids)
31
Reinforcement Learning with Skill Library
for Generalization
32
Reinforcement Learning with Skill Library
Components:
Library of skills
✓ Skill = Parameterized Policy
Behavior graph
✓ Graph consisting of skills
✓ Execution: Need to decide skill parameters and skill selections
Dynamics models
✓ Partially know, Partially unknown
33
Model-based RL vs. Model-free RL
34
[Reinforcement Learning]
[Direct Policy Search] [Value Function-based]
[Model-based]
[Model-free]
RL RL SL
[Dynamic Programming][Optimization]
Planning
depth
Learning
complexity
[Policy] [Value Functions] [Forward Models]What is
learned
0 1 N
Model-free is tend to obtain better performance
35
[Kober,Peters,2011] [Kormushev,2010]
Model-free is robust in POMDP
36
Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
POMDP:
Partially Observable
Markov Decision
Process
Model-based is suffered from simulation biases
37
Simulation bias: When forward models are inaccurate (usual when
learning models), integrating the forward models causes a rapid
increase of future state estimation errors
cf. [Atkeson,Schaal,1997b][Kober,Peters,2013]
Model-based is good at generalization
38
input
output
hidden
- u
update
FK ANN
Learning inverse kinematics of android face
[Magtanong, Yamaguchi, et al. 2012]
Model-based is good at sharing / reusing
learned components
39
✓Forward models are sharable / reusable
✓Analytical models can be combined
Model-based is flexible to reward changes
40
Model-based RL for Graph-Structured Dynamics
Model-based reinforcement learning
How to deal with simulation biases?
Do not try to learn dx/dt = F(x,u) (dt: small like xx ms)
Learn (sub)task-level dynamics
✓ Parameters  F_grasp  Grasp result
✓ Parameters  F_flow_ctrl  Flow ctrl result
Use stochastic models
✓ Gaussian  F  Gaussian
Use stochastic dynamic programming
✓ Stochastic Differential DP (DDP)
How to work with a skill library?
Dynamic Programming for graph-structured
dynamical systems
41
Model-based RL for Graph-Structured Dynamics
42
Learning Unknown
Dynamical Systems
with Stochastic
Neural Networks
Planning Actions
with Graph-DDP
43 [Yamaguchi and Atkeson, ICRA 2016]
Stochastic Neural Networks
44
Graph-DDP
[Yamaguchi and Atkeson, Humanoids 2015, 2016]
45
Works in real robots
Pouring Simulation with OpenDE
46
47
Achieved GENERALIZATION
over material variation and
container shapes
AI Approach for Robot Industry
48
When do Human-level Robots Show Up?
Breakthroughs so far
Image recognition with deep learning
Machine translation
…
Breakthroughs needed for robotic manipulation
Perception for manipulation
✓ Liquid recognition (e.g. D. Fox), Component recognition, Quantity estimation (e.g. Burgard),
Deformation recognition, …
Integration of structured knowledge (skill library, …)
Hybrid of model-based and model-free RL
Multiple representations: continuous, primitive, symbolic
Reasoning about failure recovery
Hardware for general manipulation (robot hands, tactile sensors, tools for robots)
…
49
Many breakthroughs should be necessary in AI for robotic
manipulation. Human-level is still far from now.
Method-driven vs. Idea-driven vs. Task-driven
Method-driven
Starting point is a method (AI)
Idea-driven
Starting point is an idea (AI-based technology)
Many of current deep learning applications are this type
Task-driven
Starting point is a task
AI might not be the best way
50
On-sight vs. Off-sight, On-line vs. Off-line
On-sight: Using AI in the field
Off-sight: Using AI outside the field
On-line: Sampling and learning simultaneously
Off-line: Sampling and learning separately
51
7 Things to Know Before Using AI
Why AI works? Because the AI engineer designed carefully the
task through trial and error. If an AI engineer doesn’t know the
task well, he/she cannot apply AI to the task.
No AI covers all tasks.
AI is too wide area. It’s difficult to find a superman who covers all
methods (machine learning, reasoning, optimization, …) and all
domains (robotics, computer vision, natural language proc, …).
Guaranteeing the completion of task is hard.
Guaranteeing the generalization of learned models is hard.
In many robotic applications, improving hardware >> AI solution
(e.g. adding sensors, improving mechanisms).
Humans are underrated (Elon Musk). AI and robots are overrated.
52
Proximity
vision
Force
Slip
Tactile
Emphasizing the Importance of Tactile Sensing
53
Optical Skin Sensor was useful to automate cutting behavior
Future AI x Robotics for Robotic Manipulation
Many years should be necessary for AI and robots to acquire
human-level manipulation ability
Need a lot of fundamental research
Unifying many theories: AI, Robotics, Control, Computer Vision…
Need general robot hands with tactile sensors
Education is important for sustainable development
Should increase the number of robotics x AI researchers
Case studies and competitions will boost the research
DRC, ARC, RoboCup, WRS, XPRIZE, …
Household activities (e.g. robot cooking), assembly, …56

AI-based Robotic Manipulation

  • 1.
    AI-based Robotic Manipulation AkihikoYamaguchi(*1) *1 Grad Schl of Info Sci, Tohoku University The latest version of these slides are available at: http://akihikoy.net/p/rsj18.pdf
  • 2.
    Goal of ThisTalk Introducing my work on AI-based Robotic Manipulation Discussing AI applications in robot industry Target: Researchers and engineers who understand basic theory of robotics and machine learning 2
  • 3.
  • 4.
    Can we builda robot that cooks like this? If not, what are missing? • AI/Software? • Hardware? • Sensors?
  • 5.
    Everyday Manipulations areDifficult to Robots Folding clothes Cleaning Cooking Bathing Dressing … 5 Japanese way of folding T-shirts https://youtu.be/b5A WQ5aBjgE Chinese cooking skills https://youtu.be /PFGGTPPNdRQ
  • 6.
    What are theDifficulties? Handling variations of: Dynamics ✓ Non-rigid objects (Deformable, fragile, irregular shape, …) ✓ E.g. Vegetables, meats, liquids, cloths, … ✓ No good dynamical models Situations ✓ Initial state, object properties, context, ... ✓ Each vegetable has different shape Tasks ✓ Humans are doing many tasks everyday Hardware capability: Robot << Human body Feasible tasks of robots << Feasible tasks of humans Humans have much better hands (& sensors) than robots 6
  • 7.
    What is ArtificialIntelligence? Many different methods are called “AI” Optimization Machine Learning ✓ Supervised learning ✓ Unsupervised learning ✓ Reinforcement learning Reasoning ✓ Search ✓ Motion planning … None of the above is AI OR all programs are AI (“if x>0 then y else z” is a simplest AI) 7 AI can do many things Many (AI) methods are developed for many tasks
  • 8.
    Why is AIUseful in Robotic Manipulation? Handling variations: Learning dynamics and tasks Adapting to new situations Generalizing the behavior to new situations and tasks Machine Learning: Tools for adaptation Reasoning: Tools for generalization Optimization: Most fundamental tools 8 AI is a solution to handle variations
  • 9.
    Hardware vs. AI(Software) General robot arms: Available 6+ DoF arms General robot hands: Not available Existing dexterous robot hands do not cover the tasks that humans can do Good vision: Available Good cameras Good tactile sensors: Not available No de-facto standard tactile sensors 9 General AI (for manipulation) research needs General Hardware (arms, hands, sensors) *But we don’t know what general hardware is
  • 10.
    AI for RoboticManipulation 10 Goal: Finding a policy to perform a given task
  • 11.
    Dynamic Programming &Reinforcement Learning 11 Moving forward, Tasty, “I am satisfied ☺”, …. Dynamic Programming when {Fk} are given Reinforcement Learning when {Fk} are unknown Robotic manipulation is generally formulated as a reinforcement learning problem
  • 12.
    Hypothesis? 12 If we havea general reinforcement learning method, robots can learn any (robotic manipulation) tasks
  • 13.
  • 14.
    Yamaguchi et al."DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013 https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
  • 15.
    Deep Reinforcement Learning Deeplearning: With big data, NN can learn any I/O mapping with any precision. We don't have to care about how large the state space is. It can directly handle image as an input without designing features. Deep RL: Using deep NN to represent policy, dynamical models, or value functions, Deep RL can handle large state space with big data. 15
  • 16.
    Deep Reinforcement Learning 16 (T-L)Learning to play Atari games by Google DeepMind, Mnih et al. 2015 https://youtu.be/cjpEIotvwFY (T-R) DeepMPC Robotic Experiments - PR2 cuts food, Lenz et al. 2015 https://youtu.be/BwA90MmkvPU (B-L) Learning to grasp from 50K Tries, Pinto et al. 2016 https://youtu.be/oSqHc0nLkm8 (B-R) Learning hand-eye coordination for robotic grasping, Levine et al. 2017 https://youtu.be/l8zKZLqkfII
  • 17.
    17 Rajeswaran, Kumar, Gupta,Schulman, Todorov, Levine: Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations https://sites.google.com/view/deeprl-dexterous-manipulation
  • 18.
    What is GoodAI for Robotic Manipulation? Task achievement (sum of rewards) Learning speed (Number of samples) Key axes to measure AI: (from talk of Leslie Kaelbling in ICRA’16) Adaptability Generalization ability Scalability 18 Experience Performance Learning curve is used to measure the learning performance
  • 19.
    What is aPromising Approach? Deep (Reinforcement) Learning? End-to-End Learning? Imitation Learning? … 19 No promising approach has been proposed yet
  • 20.
    Baxter peels bananahttps://youtu.be/rEeixPBd3hc
  • 21.
    Hypothesis: AI-based RobotManipulation Library of skills is essential Having many alternative strategies Reasoning and learning are core tools Structured knowledge should be introduced Skills, dynamics models, policies, … Unified approach is the way to go Hybrid of model-based and model-free Multiple representations: continuous, primitive, symbolic (and, unexplored stuff: e.g. Perception skills) 21
  • 22.
    Library of skillsis essential 22
  • 23.
  • 25.
    Pouring Behavior withSkill Library Skill library flow ctrl (tip, shake, …), grasp, move arm, …  State machines (structure, feedback control) Planning methods grasp, re-grasp, pouring locations, feasible trajectories, …  Optimization-based approach Learning methods Skill selection  Table, Softmax Parameter adjustment (e.g. shake axis)  Optimization (CMA-ES) Improve plan quality  Improve value functions25 [Yamaguchi et al. IJHR 2015]
  • 30.
    Sharing Knowledge AmongRobots 30 The same implementation worked on PR2 and Baxter PR2 and Baxter: Diff: Kinematics, grippers Same: Arm DoF, sensors Sharable knowledge: Skills Behavior structure Not sharable: Policy parameters
  • 31.
    How Good isThis AI? Scalability Framework: Applicable to many tasks (to be verified) Skills: Reusable in many contexts (to be verified) Adaptability Adapted skill parameters and skill selections in a few episodes Simple machine learning and optimization tools worked Generalization ability Generalized behaviors over traditional robotic manipulations (e.g. grasping & moving containers) Could not generalize over non-rigid-objects (liquids) 31
  • 32.
    Reinforcement Learning withSkill Library for Generalization 32
  • 33.
    Reinforcement Learning withSkill Library Components: Library of skills ✓ Skill = Parameterized Policy Behavior graph ✓ Graph consisting of skills ✓ Execution: Need to decide skill parameters and skill selections Dynamics models ✓ Partially know, Partially unknown 33
  • 34.
    Model-based RL vs.Model-free RL 34 [Reinforcement Learning] [Direct Policy Search] [Value Function-based] [Model-based] [Model-free] RL RL SL [Dynamic Programming][Optimization] Planning depth Learning complexity [Policy] [Value Functions] [Forward Models]What is learned 0 1 N
  • 35.
    Model-free is tendto obtain better performance 35 [Kober,Peters,2011] [Kormushev,2010]
  • 36.
    Model-free is robustin POMDP 36 Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013 https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS POMDP: Partially Observable Markov Decision Process
  • 37.
    Model-based is sufferedfrom simulation biases 37 Simulation bias: When forward models are inaccurate (usual when learning models), integrating the forward models causes a rapid increase of future state estimation errors cf. [Atkeson,Schaal,1997b][Kober,Peters,2013]
  • 38.
    Model-based is goodat generalization 38 input output hidden - u update FK ANN Learning inverse kinematics of android face [Magtanong, Yamaguchi, et al. 2012]
  • 39.
    Model-based is goodat sharing / reusing learned components 39 ✓Forward models are sharable / reusable ✓Analytical models can be combined
  • 40.
    Model-based is flexibleto reward changes 40
  • 41.
    Model-based RL forGraph-Structured Dynamics Model-based reinforcement learning How to deal with simulation biases? Do not try to learn dx/dt = F(x,u) (dt: small like xx ms) Learn (sub)task-level dynamics ✓ Parameters  F_grasp  Grasp result ✓ Parameters  F_flow_ctrl  Flow ctrl result Use stochastic models ✓ Gaussian  F  Gaussian Use stochastic dynamic programming ✓ Stochastic Differential DP (DDP) How to work with a skill library? Dynamic Programming for graph-structured dynamical systems 41
  • 42.
    Model-based RL forGraph-Structured Dynamics 42 Learning Unknown Dynamical Systems with Stochastic Neural Networks Planning Actions with Graph-DDP
  • 43.
    43 [Yamaguchi andAtkeson, ICRA 2016] Stochastic Neural Networks
  • 44.
  • 45.
  • 46.
  • 47.
    47 Achieved GENERALIZATION over materialvariation and container shapes
  • 48.
    AI Approach forRobot Industry 48
  • 49.
    When do Human-levelRobots Show Up? Breakthroughs so far Image recognition with deep learning Machine translation … Breakthroughs needed for robotic manipulation Perception for manipulation ✓ Liquid recognition (e.g. D. Fox), Component recognition, Quantity estimation (e.g. Burgard), Deformation recognition, … Integration of structured knowledge (skill library, …) Hybrid of model-based and model-free RL Multiple representations: continuous, primitive, symbolic Reasoning about failure recovery Hardware for general manipulation (robot hands, tactile sensors, tools for robots) … 49 Many breakthroughs should be necessary in AI for robotic manipulation. Human-level is still far from now.
  • 50.
    Method-driven vs. Idea-drivenvs. Task-driven Method-driven Starting point is a method (AI) Idea-driven Starting point is an idea (AI-based technology) Many of current deep learning applications are this type Task-driven Starting point is a task AI might not be the best way 50
  • 51.
    On-sight vs. Off-sight,On-line vs. Off-line On-sight: Using AI in the field Off-sight: Using AI outside the field On-line: Sampling and learning simultaneously Off-line: Sampling and learning separately 51
  • 52.
    7 Things toKnow Before Using AI Why AI works? Because the AI engineer designed carefully the task through trial and error. If an AI engineer doesn’t know the task well, he/she cannot apply AI to the task. No AI covers all tasks. AI is too wide area. It’s difficult to find a superman who covers all methods (machine learning, reasoning, optimization, …) and all domains (robotics, computer vision, natural language proc, …). Guaranteeing the completion of task is hard. Guaranteeing the generalization of learned models is hard. In many robotic applications, improving hardware >> AI solution (e.g. adding sensors, improving mechanisms). Humans are underrated (Elon Musk). AI and robots are overrated. 52
  • 53.
  • 55.
    Optical Skin Sensorwas useful to automate cutting behavior
  • 56.
    Future AI xRobotics for Robotic Manipulation Many years should be necessary for AI and robots to acquire human-level manipulation ability Need a lot of fundamental research Unifying many theories: AI, Robotics, Control, Computer Vision… Need general robot hands with tactile sensors Education is important for sustainable development Should increase the number of robotics x AI researchers Case studies and competitions will boost the research DRC, ARC, RoboCup, WRS, XPRIZE, … Household activities (e.g. robot cooking), assembly, …56