SlideShare a Scribd company logo
1 of 3
Robots are Learning to Pick up
Objects Like Babies
In the future, this technology could help self-driving cars anticipate
future events on the road and produce more intelligent robotic
assistants in homes, but the initial prototype focuses on learning simple
manual skills entirely from autonomous play.
Using this technology, called visual foresight, the robots can predict
what their cameras will see if they perform a particular sequence of
movements. These robotic imaginations are still relatively simple for now
– predictions made only several seconds into the future – but they are
enough for the robot to figure out how to move objects around on a
table without disturbing obstacles.
Crucially, the robot can learn to perform these tasks without any help
from humans or prior knowledge about physics, its environment or what
the objects are. That’s because the visual imagination is learned entirely
from scratch from unattended and unsupervised exploration, where the
robot plays with objects on a table. After this play phase, the robot builds
a predictive model of the world, and can use this model to manipulate
new objects that it has not seen before.
“In the same way that we can imagine how our actions will move the
objects in our environment, this method can enable a robot to visualize
how different behaviors will affect the world around it,” said Sergey
Levine, assistant professor in Berkeley’s Department of Electrical
Engineering and Computer Sciences, whose lab developed the
technology. “This can enable intelligent planning of highly flexible skills
in complex real-world situations.”
At the core of this system is a deep learning technology based on
convolutional recurrent video prediction, or dynamic neural advection
(DNA). DNA-based models predict how pixels in an image will move
from one frame to the next based on the robot’s actions. Recent
improvements to this class of models, as well as greatly improved
planning capabilities, have enabled robotic control based on video
prediction to perform increasingly complex tasks, such as sliding toys
around obstacles and repositioning multiple objects.
In that past, robots have learned skills with a human supervisor helping
and providing feedback. What makes this work exciting is that the robots
can learn a range of visual object manipulation skills entirely on their
own,” said Chelsea Finn, a doctoral student in Levine’s lab and inventor
of the original DNA model.
With the new technology, a robot pushes objects on a table, then uses
the learned prediction model to choose motions that will move an object
to a desired location. Robots use the learned model from raw camera
observations to teach themselves how to avoid obstacles and push
objects around obstructions.
“Humans learn object manipulation skills without any teacher through
millions of interactions with a variety of objects during their lifetime. We
have shown that it possible to build a robotic system that also leverages
large amounts of autonomously collected data to learn widely applicable
manipulation skills, specifically object pushing skills,” said Frederik Ebert,
a graduate student in Levine’s lab who worked on the project.
Since control through video prediction relies only on observations that
can be collected autonomously by the robot, such as through camera
images, the resulting method is general and broadly applicable. In
contrast to conventional computer vision methods, which require
humans to manually label thousands or even millions of images, building
video prediction models only requires unannotated video, which can be
collected by the robot entirely autonomously. Indeed, video prediction
models have also been applied to datasets that represent everything
from human activities to driving, with compelling results.
“Children can learn about their world by playing with toys, moving them
around, grasping, and so forth. Our aim with this research is to enable a
robot to do the same: to learn about how the world works through
autonomous interaction,” Levine said. “The capabilities of this robot are
still limited, but its skills are learned entirely automatically, and allow it to
predict complex physical interactions with objects that it has never seen
before by building on previously observed patterns of interaction.”
The Berkeley scientists are continuing to research control through video
prediction, focusing on further improving video prediction and
prediction-based control, as well as developing more sophisticated
methods by which robots can collected more focused video data, for
complex tasks such as picking and placing objects and manipulating soft
and deformable objects such as cloth or rope, and assembly.

More Related Content

Similar to Robots are Learning to Pick up Objects Like Babies

Control of a Movable Robot Head Using Vision-Based Object Tracking
Control of a Movable Robot Head Using Vision-Based Object TrackingControl of a Movable Robot Head Using Vision-Based Object Tracking
Control of a Movable Robot Head Using Vision-Based Object TrackingIJECEIAES
 
A cognitive robot equipped with autonomous tool innovation expertise
A cognitive robot equipped with autonomous tool  innovation expertise A cognitive robot equipped with autonomous tool  innovation expertise
A cognitive robot equipped with autonomous tool innovation expertise IJECEIAES
 
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operatorProposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operatorQUESTJOURNAL
 
Computer Vision for Object Detection and Tracking.pdf
Computer Vision for Object Detection and Tracking.pdfComputer Vision for Object Detection and Tracking.pdf
Computer Vision for Object Detection and Tracking.pdfNexgits Private Limited
 
Natural Semantics for a Robot notes:
Natural Semantics for a Robot notes: Natural Semantics for a Robot notes:
Natural Semantics for a Robot notes: butest
 
Page 1 of 14 ENS4152 Project Development Proposal a.docx
Page 1 of 14  ENS4152 Project Development Proposal a.docxPage 1 of 14  ENS4152 Project Development Proposal a.docx
Page 1 of 14 ENS4152 Project Development Proposal a.docxkarlhennesey
 
Page 1 of 14 ENS4152 Project Development Proposal a.docx
Page 1 of 14  ENS4152 Project Development Proposal a.docxPage 1 of 14  ENS4152 Project Development Proposal a.docx
Page 1 of 14 ENS4152 Project Development Proposal a.docxsmile790243
 
Page 1 of 14 ENS4152 Project Development Proposal a.docx
Page 1 of 14  ENS4152 Project Development Proposal a.docxPage 1 of 14  ENS4152 Project Development Proposal a.docx
Page 1 of 14 ENS4152 Project Development Proposal a.docxjakeomoore75037
 
A novel enhanced algorithm for efficient human tracking
A novel enhanced algorithm for efficient human trackingA novel enhanced algorithm for efficient human tracking
A novel enhanced algorithm for efficient human trackingIJICTJOURNAL
 
ACRV Research Fellow Intro/Tutorial [Vision and Action]
ACRV Research Fellow Intro/Tutorial [Vision and Action]ACRV Research Fellow Intro/Tutorial [Vision and Action]
ACRV Research Fellow Intro/Tutorial [Vision and Action]Juxi Leitner
 
Branches Of Robotics
Branches Of RoboticsBranches Of Robotics
Branches Of Roboticsparthmullick
 
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...IRJET Journal
 
Gesture detection by virtual surface
Gesture detection by virtual surfaceGesture detection by virtual surface
Gesture detection by virtual surfaceAshish Garg
 
Scheme for motion estimation based on adaptive fuzzy neural network
Scheme for motion estimation based on adaptive fuzzy neural networkScheme for motion estimation based on adaptive fuzzy neural network
Scheme for motion estimation based on adaptive fuzzy neural networkTELKOMNIKA JOURNAL
 
Anthrotronix 05_15MissionCritical_web-3
Anthrotronix 05_15MissionCritical_web-3Anthrotronix 05_15MissionCritical_web-3
Anthrotronix 05_15MissionCritical_web-3Scott Kesselman
 
A Robotic Prototype System for Child Monitoring
A Robotic Prototype System for Child MonitoringA Robotic Prototype System for Child Monitoring
A Robotic Prototype System for Child MonitoringWaqas Tariq
 
Follow Me Robot Technology
Follow Me Robot TechnologyFollow Me Robot Technology
Follow Me Robot Technologyijsrd.com
 
Object Detection and Tracking AI Robot
Object Detection and Tracking AI RobotObject Detection and Tracking AI Robot
Object Detection and Tracking AI RobotIRJET Journal
 

Similar to Robots are Learning to Pick up Objects Like Babies (20)

Control of a Movable Robot Head Using Vision-Based Object Tracking
Control of a Movable Robot Head Using Vision-Based Object TrackingControl of a Movable Robot Head Using Vision-Based Object Tracking
Control of a Movable Robot Head Using Vision-Based Object Tracking
 
A cognitive robot equipped with autonomous tool innovation expertise
A cognitive robot equipped with autonomous tool  innovation expertise A cognitive robot equipped with autonomous tool  innovation expertise
A cognitive robot equipped with autonomous tool innovation expertise
 
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operatorProposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
 
Computer Vision for Object Detection and Tracking.pdf
Computer Vision for Object Detection and Tracking.pdfComputer Vision for Object Detection and Tracking.pdf
Computer Vision for Object Detection and Tracking.pdf
 
Natural Semantics for a Robot notes:
Natural Semantics for a Robot notes: Natural Semantics for a Robot notes:
Natural Semantics for a Robot notes:
 
Page 1 of 14 ENS4152 Project Development Proposal a.docx
Page 1 of 14  ENS4152 Project Development Proposal a.docxPage 1 of 14  ENS4152 Project Development Proposal a.docx
Page 1 of 14 ENS4152 Project Development Proposal a.docx
 
Page 1 of 14 ENS4152 Project Development Proposal a.docx
Page 1 of 14  ENS4152 Project Development Proposal a.docxPage 1 of 14  ENS4152 Project Development Proposal a.docx
Page 1 of 14 ENS4152 Project Development Proposal a.docx
 
Page 1 of 14 ENS4152 Project Development Proposal a.docx
Page 1 of 14  ENS4152 Project Development Proposal a.docxPage 1 of 14  ENS4152 Project Development Proposal a.docx
Page 1 of 14 ENS4152 Project Development Proposal a.docx
 
A novel enhanced algorithm for efficient human tracking
A novel enhanced algorithm for efficient human trackingA novel enhanced algorithm for efficient human tracking
A novel enhanced algorithm for efficient human tracking
 
ACRV Research Fellow Intro/Tutorial [Vision and Action]
ACRV Research Fellow Intro/Tutorial [Vision and Action]ACRV Research Fellow Intro/Tutorial [Vision and Action]
ACRV Research Fellow Intro/Tutorial [Vision and Action]
 
Perception capabilities of ai robots
Perception capabilities of ai robotsPerception capabilities of ai robots
Perception capabilities of ai robots
 
Intro
IntroIntro
Intro
 
Branches Of Robotics
Branches Of RoboticsBranches Of Robotics
Branches Of Robotics
 
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
 
Gesture detection by virtual surface
Gesture detection by virtual surfaceGesture detection by virtual surface
Gesture detection by virtual surface
 
Scheme for motion estimation based on adaptive fuzzy neural network
Scheme for motion estimation based on adaptive fuzzy neural networkScheme for motion estimation based on adaptive fuzzy neural network
Scheme for motion estimation based on adaptive fuzzy neural network
 
Anthrotronix 05_15MissionCritical_web-3
Anthrotronix 05_15MissionCritical_web-3Anthrotronix 05_15MissionCritical_web-3
Anthrotronix 05_15MissionCritical_web-3
 
A Robotic Prototype System for Child Monitoring
A Robotic Prototype System for Child MonitoringA Robotic Prototype System for Child Monitoring
A Robotic Prototype System for Child Monitoring
 
Follow Me Robot Technology
Follow Me Robot TechnologyFollow Me Robot Technology
Follow Me Robot Technology
 
Object Detection and Tracking AI Robot
Object Detection and Tracking AI RobotObject Detection and Tracking AI Robot
Object Detection and Tracking AI Robot
 

Recently uploaded

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 

Recently uploaded (20)

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Robots are Learning to Pick up Objects Like Babies

  • 1. Robots are Learning to Pick up Objects Like Babies In the future, this technology could help self-driving cars anticipate future events on the road and produce more intelligent robotic assistants in homes, but the initial prototype focuses on learning simple manual skills entirely from autonomous play. Using this technology, called visual foresight, the robots can predict what their cameras will see if they perform a particular sequence of movements. These robotic imaginations are still relatively simple for now – predictions made only several seconds into the future – but they are enough for the robot to figure out how to move objects around on a table without disturbing obstacles. Crucially, the robot can learn to perform these tasks without any help from humans or prior knowledge about physics, its environment or what the objects are. That’s because the visual imagination is learned entirely from scratch from unattended and unsupervised exploration, where the robot plays with objects on a table. After this play phase, the robot builds a predictive model of the world, and can use this model to manipulate new objects that it has not seen before. “In the same way that we can imagine how our actions will move the objects in our environment, this method can enable a robot to visualize how different behaviors will affect the world around it,” said Sergey Levine, assistant professor in Berkeley’s Department of Electrical Engineering and Computer Sciences, whose lab developed the
  • 2. technology. “This can enable intelligent planning of highly flexible skills in complex real-world situations.” At the core of this system is a deep learning technology based on convolutional recurrent video prediction, or dynamic neural advection (DNA). DNA-based models predict how pixels in an image will move from one frame to the next based on the robot’s actions. Recent improvements to this class of models, as well as greatly improved planning capabilities, have enabled robotic control based on video prediction to perform increasingly complex tasks, such as sliding toys around obstacles and repositioning multiple objects. In that past, robots have learned skills with a human supervisor helping and providing feedback. What makes this work exciting is that the robots can learn a range of visual object manipulation skills entirely on their own,” said Chelsea Finn, a doctoral student in Levine’s lab and inventor of the original DNA model. With the new technology, a robot pushes objects on a table, then uses the learned prediction model to choose motions that will move an object to a desired location. Robots use the learned model from raw camera observations to teach themselves how to avoid obstacles and push objects around obstructions. “Humans learn object manipulation skills without any teacher through millions of interactions with a variety of objects during their lifetime. We have shown that it possible to build a robotic system that also leverages large amounts of autonomously collected data to learn widely applicable manipulation skills, specifically object pushing skills,” said Frederik Ebert, a graduate student in Levine’s lab who worked on the project. Since control through video prediction relies only on observations that can be collected autonomously by the robot, such as through camera images, the resulting method is general and broadly applicable. In
  • 3. contrast to conventional computer vision methods, which require humans to manually label thousands or even millions of images, building video prediction models only requires unannotated video, which can be collected by the robot entirely autonomously. Indeed, video prediction models have also been applied to datasets that represent everything from human activities to driving, with compelling results. “Children can learn about their world by playing with toys, moving them around, grasping, and so forth. Our aim with this research is to enable a robot to do the same: to learn about how the world works through autonomous interaction,” Levine said. “The capabilities of this robot are still limited, but its skills are learned entirely automatically, and allow it to predict complex physical interactions with objects that it has never seen before by building on previously observed patterns of interaction.” The Berkeley scientists are continuing to research control through video prediction, focusing on further improving video prediction and prediction-based control, as well as developing more sophisticated methods by which robots can collected more focused video data, for complex tasks such as picking and placing objects and manipulating soft and deformable objects such as cloth or rope, and assembly.