AI-based Robotic Manipulation

AI-based Robotic Manipulation
Akihiko Yamaguchi(*1)
*1 Grad Schl of Info Sci, Tohoku University
The latest version of these
slides are available at:
http://akihikoy.net/p/rsj18.pdf

Goal of This Talk
Introducing my work on AI-based Robotic
Manipulation
Discussing AI applications in robot industry
Target: Researchers and engineers who understand
basic theory of robotics and machine learning
2

input
output
hidden
－ u
update
FK ANN

Can we build a robot that cooks like this?
If not, what are missing?
• AI/Software?
• Hardware?
• Sensors?

Everyday Manipulations are Difficult to Robots
Folding clothes
Cleaning
Cooking
Bathing
Dressing
…
5
Japanese way of
folding T-shirts
https://youtu.be/b5A
WQ5aBjgE
Chinese
cooking skills
https://youtu.be
/PFGGTPPNdRQ

What are the Difficulties?
Handling variations of:
Dynamics
✓ Non-rigid objects (Deformable, fragile, irregular shape, …)
✓ E.g. Vegetables, meats, liquids, cloths, …
✓ No good dynamical models
Situations
✓ Initial state, object properties, context, ...
✓ Each vegetable has different shape
Tasks
✓ Humans are doing many tasks everyday
Hardware capability: Robot << Human body
Feasible tasks of robots << Feasible tasks of humans
Humans have much better hands (& sensors) than robots
6

What is Artificial Intelligence?
Many different methods are called “AI”
Optimization
Machine Learning
✓ Supervised learning
✓ Unsupervised learning
✓ Reinforcement learning
Reasoning
✓ Search
✓ Motion planning
…
None of the above is AI
OR all programs are AI (“if x>0 then y else z” is a simplest AI)
7
AI can do many things
Many (AI) methods are
developed for many tasks

Why is AI Useful in Robotic Manipulation?
Handling variations:
Learning dynamics and tasks
Adapting to new situations
Generalizing the behavior to new situations and tasks
Machine Learning: Tools for adaptation
Reasoning: Tools for generalization
Optimization: Most fundamental tools
8
AI is a solution to handle variations

Hardware vs. AI (Software)
General robot arms: Available
6+ DoF arms
General robot hands: Not available
Existing dexterous robot hands do not cover the tasks that humans
can do
Good vision: Available
Good cameras
Good tactile sensors: Not available
No de-facto standard tactile sensors
9
General AI (for manipulation) research needs
General Hardware (arms, hands, sensors)
*But we don’t know what general hardware is

AI for Robotic Manipulation
10
Goal: Finding a policy to perform a given task

Dynamic Programming & Reinforcement Learning
11
Moving forward, Tasty,
“I am satisfied ☺”, ….
Dynamic Programming when {Fk} are given
Reinforcement Learning when {Fk} are unknown
Robotic manipulation is generally formulated
as a reinforcement learning problem

Hypothesis?
12
If we have a general reinforcement learning method,
robots can learn any (robotic manipulation) tasks

https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS

Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013

Deep Reinforcement Learning
Deep learning: With big data, NN can learn any I/O
mapping with any precision. We don't have to care
about how large the state space is. It can directly
handle image as an input without designing
features.
Deep RL: Using deep NN to represent policy,
dynamical models, or value functions, Deep RL can
handle large state space with big data.
15

Deep Reinforcement Learning
16
(T-L) Learning to play Atari
games by Google DeepMind,
Mnih et al. 2015
https://youtu.be/cjpEIotvwFY
(T-R) DeepMPC Robotic
Experiments - PR2 cuts food,
Lenz et al. 2015
https://youtu.be/BwA90MmkvPU
(B-L) Learning to grasp from
50K Tries, Pinto et al. 2016
https://youtu.be/oSqHc0nLkm8
(B-R) Learning hand-eye
coordination for robotic
grasping, Levine et al. 2017
https://youtu.be/l8zKZLqkfII

17
Rajeswaran, Kumar, Gupta, Schulman, Todorov, Levine: Learning Complex Dexterous Manipulation with Deep
Reinforcement Learning and Demonstrations
https://sites.google.com/view/deeprl-dexterous-manipulation

What is Good AI for Robotic Manipulation?
Task achievement (sum of rewards)
Learning speed (Number of samples)
Key axes to measure AI:
(from talk of Leslie Kaelbling in ICRA’16)
Adaptability
Generalization ability
Scalability
18
Experience
Performance
Learning curve is used to measure
the learning performance

What is a Promising Approach?
Deep (Reinforcement) Learning?
End-to-End Learning?
Imitation Learning?
…
19
No promising approach has been proposed yet

Baxter peels banana https://youtu.be/rEeixPBd3hc

Hypothesis: AI-based Robot Manipulation
Library of skills is essential
Having many alternative strategies
Reasoning and learning are core tools
Structured knowledge should be introduced
Skills, dynamics models, policies, …
Unified approach is the way to go
Hybrid of model-based and model-free
Multiple representations: continuous, primitive, symbolic
(and, unexplored stuff: e.g. Perception skills)
21

Library of skills is essential
22

http://reflectionsintheword.files.wordpress.com/
2012/08/pouring-water-into-glass.jpg
http://schools.graniteschools.org/
edtech-canderson/files/2013/01/
heinz-ketchup-old-bottle.jpg
http://old.post-gazette.com/images2/
20021213hosqueeze_230.jpg
http://img.diytrade.com/cdimg/1352823/17809917/
0/1292834033/shampoo_bottle_bodywash_bottle.jpg http://www.nescafe.com/
upload/golden_roast_f_711.png

Pouring Behavior with Skill Library
Skill library
flow ctrl (tip, shake, …), grasp, move arm, …
 State machines (structure, feedback control)
Planning methods
grasp, re-grasp, pouring locations,
feasible trajectories, …
 Optimization-based approach
Learning methods
Skill selection  Table, Softmax
Parameter adjustment
(e.g. shake axis)  Optimization (CMA-ES)
Improve plan quality  Improve value functions25 [Yamaguchi et al. IJHR 2015]

Sharing Knowledge Among Robots
30
The same implementation
worked on PR2 and Baxter
PR2 and Baxter:
Diff: Kinematics, grippers
Same: Arm DoF, sensors
Sharable knowledge:
Skills
Behavior structure
Not sharable:
Policy parameters

How Good is This AI?
Scalability
Framework: Applicable to many tasks (to be verified)
Skills: Reusable in many contexts (to be verified)
Adaptability
Adapted skill parameters and skill selections in a few episodes
Simple machine learning and optimization tools worked
Generalization ability
Generalized behaviors over traditional robotic manipulations
(e.g. grasping & moving containers)
Could not generalize over non-rigid-objects (liquids)
31

Reinforcement Learning with Skill Library
for Generalization
32

Reinforcement Learning with Skill Library
Components:
Library of skills
✓ Skill = Parameterized Policy
Behavior graph
✓ Graph consisting of skills
✓ Execution: Need to decide skill parameters and skill selections
Dynamics models
✓ Partially know, Partially unknown
33

Model-based RL vs. Model-free RL
34
[Reinforcement Learning]
[Direct Policy Search] [Value Function-based]
[Model-based]
[Model-free]
RL RL SL
[Dynamic Programming][Optimization]
Planning
depth
Learning
complexity
[Policy] [Value Functions] [Forward Models]What is
learned
0 1 N

Model-free is tend to obtain better performance
35
[Kober,Peters,2011] [Kormushev,2010]

Model-free is robust in POMDP
36
Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013
POMDP:
Partially Observable
Markov Decision
Process

Model-based is suffered from simulation biases
37
Simulation bias: When forward models are inaccurate (usual when
learning models), integrating the forward models causes a rapid
increase of future state estimation errors
cf. [Atkeson,Schaal,1997b][Kober,Peters,2013]

Model-based is good at generalization
38
input
output
hidden
－ u
update
FK ANN
Learning inverse kinematics of android face
[Magtanong, Yamaguchi, et al. 2012]

Model-based is good at sharing / reusing
learned components
39
✓Forward models are sharable / reusable
✓Analytical models can be combined

Model-based is flexible to reward changes
40

Model-based RL for Graph-Structured Dynamics
Model-based reinforcement learning
How to deal with simulation biases?
Do not try to learn dx/dt = F(x,u) (dt: small like xx ms)
Learn (sub)task-level dynamics
✓ Parameters  F_grasp  Grasp result
✓ Parameters  F_flow_ctrl  Flow ctrl result
Use stochastic models
✓ Gaussian  F  Gaussian
Use stochastic dynamic programming
✓ Stochastic Differential DP (DDP)
How to work with a skill library?
Dynamic Programming for graph-structured
dynamical systems
41

Model-based RL for Graph-Structured Dynamics
42
Learning Unknown
Dynamical Systems
with Stochastic
Neural Networks
Planning Actions
with Graph-DDP

43 [Yamaguchi and Atkeson, ICRA 2016]
Stochastic Neural Networks

44
Graph-DDP
[Yamaguchi and Atkeson, Humanoids 2015, 2016]

Pouring Simulation with OpenDE
46

47
Achieved GENERALIZATION
over material variation and
container shapes

AI Approach for Robot Industry
48

When do Human-level Robots Show Up?
Breakthroughs so far
Image recognition with deep learning
Machine translation
…
Breakthroughs needed for robotic manipulation
Perception for manipulation
✓ Liquid recognition (e.g. D. Fox), Component recognition, Quantity estimation (e.g. Burgard),
Deformation recognition, …
Integration of structured knowledge (skill library, …)
Hybrid of model-based and model-free RL
Multiple representations: continuous, primitive, symbolic
Reasoning about failure recovery
Hardware for general manipulation (robot hands, tactile sensors, tools for robots)
…
49
Many breakthroughs should be necessary in AI for robotic
manipulation. Human-level is still far from now.

Method-driven vs. Idea-driven vs. Task-driven
Method-driven
Starting point is a method (AI)
Idea-driven
Starting point is an idea (AI-based technology)
Many of current deep learning applications are this type
Task-driven
Starting point is a task
AI might not be the best way
50

On-sight vs. Off-sight, On-line vs. Off-line
On-sight: Using AI in the field
Off-sight: Using AI outside the field
On-line: Sampling and learning simultaneously
Off-line: Sampling and learning separately
51

7 Things to Know Before Using AI
Why AI works? Because the AI engineer designed carefully the
task through trial and error. If an AI engineer doesn’t know the
task well, he/she cannot apply AI to the task.
No AI covers all tasks.
AI is too wide area. It’s difficult to find a superman who covers all
methods (machine learning, reasoning, optimization, …) and all
domains (robotics, computer vision, natural language proc, …).
Guaranteeing the completion of task is hard.
Guaranteeing the generalization of learned models is hard.
In many robotic applications, improving hardware >> AI solution
(e.g. adding sensors, improving mechanisms).
Humans are underrated (Elon Musk). AI and robots are overrated.
52

Proximity
vision
Force
Slip
Tactile
Emphasizing the Importance of Tactile Sensing
53

Optical Skin Sensor was useful to automate cutting behavior

Future AI x Robotics for Robotic Manipulation
Many years should be necessary for AI and robots to acquire
human-level manipulation ability
Need a lot of fundamental research
Unifying many theories: AI, Robotics, Control, Computer Vision…
Need general robot hands with tactile sensors
Education is important for sustainable development
Should increase the number of robotics x AI researchers
Case studies and competitions will boost the research
DRC, ARC, RoboCup, WRS, XPRIZE, …
Household activities (e.g. robot cooking), assembly, …56

AI-based Robotic Manipulation

More Related Content

Similar to AI-based Robotic Manipulation

Recently uploaded

AI-based Robotic Manipulation