SlideShare a Scribd company logo
1 of 24
ClearGrasp
2020/03/13
Joji Toyama
2
Introduction
• Recognizing 3D geometry of objects are important to automate human operation. ex. Grasping
• However recognizing 3D geometry of transparent objects are difficult.
– Because of the specular highlights or reflecting back from the surface behind the object.
• ClearGrasp proposed the method to recognize 3D geometry of transparent objects from a single
RGB-D Image.
– By using Sim2Real techniques and proposing the CNN architecture for the task.
3
Supplement: Errors in depth estimation for transparent objects
Transparent objects occur erros in depth estimation from RGB-D camera.
– Type I: erros are caused by specular highlights.
– Type II: erros are caused by reflecting back from the surface behind the object.
4
Sim2Real: Learning from synthetic data.
• To train NN for image recognition, we need a lot of images and labels, which is laborious, costly
and time consuming.
• Image synthetic techniques can generate a lot of images with labels.
synthetic data
real data
5
Related work of Sim2Real
• Synthetic data was used in various tasks, but the research which concern transparent objects is
few.
• Transparent-included dataset wasn’t used for 3D reconstruction.
6
Synthetic data was used in various tasks, but the research which concern
transparent objects is few.
Synthetic dataset was used for control humanoid hand.
Learning Dexterous In-Hand Manipulation
Synthetic dataset was used for grasping objects.
Domain Randomization for Transferring
Deep Neural Networks from Simulation to
the Real World
7
Transparent-included dataset wasn’t used for 3D reconstruction.
Synthetic dataset which contain transparent object was used for refractive flow estimation, semantic
segmentation, or relative depth.
reflective flow estimation
TOM-Net: Learning Transparent
Object Matting from a Single Image
semantic segmentation
Material-Based Segmentation of
Objects
relative path
Single-Shot Analysis of Refractive Shape Using
Convolutional Neural Networks
8
Paper Information
• Title: ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation
• Authors: Shreeyak S. Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Lee, Andy Zeng,
Shuran Song
• Institutes: Synthesis.ai, Google, Columbia University.
• Research Page: https://sites.google.com/view/cleargrasp
• Dataset: https://sites.google.com/view/cleargrasp/data?authuser=0
• Code: https://github.com/Shreeyak/cleargrasp
• Publication: ICRA 2020
• Blog: https://ai.googleblog.com/2020/02/learning-to-see-transparent-objects.html
9
Results of ClearGrasp
10
Abstract
• Created synthetic and real 3D geometry dataset of transparent objects.
• Propose an architecture to infer accurate 3D geometry of transparent objects from a single RGB-
D image.
11
ClearGrasp dataset: Synthetic dataset
• 50,000 photorealistic renders.
• Include surface normals, segmentation masks edges, and depth.
• Each image contains up to 5 transparent objects.
• Transparent objects are on a flat ground or inside a tote with various backgrounds and lighting.
• Synthetic Images are generated from Blender physics engine and rendering engine.
12
ClearGrasp dataset: Real dataset
• Real dataset
– 286 images.
– Include RGB-Images and Depth Images.
– Each image contains up to 6 transparent objects with an average of 2 objects per image.
– Transparent objects were placed in the scene along with various random opaque objects like
cardboard boxes, decorative mantelpieces and fruits.
– First they put spray painted objects same as transparent ones to get depth and replaced using
GUI app to put sub-millimeter accuracy can be achieved in the positioning of the objects.
13
Overview of proposed CNN architecture
14
Method: RGB Image→surface normals/mask/occulusion boundaries
• Transparent object segmentation
– outputs: the pixel-wise masks of transparent objects.
• Surface Normal estimation
– output: surface normal (3 dims), L2 normalized
• Boundary detection
– output: each pixel labels of the input images(Non-Edge/Occlusion Boundary/Contact Edges)
※ all network architectures are DeepLabv3 + DRN-D-54
15
Method: Global optimization
• loss function is written in right.
• notation
– 𝐸 𝐷: the distance between the estimated depth 𝐷(𝑝) and the
observed raw depth 𝐷0(𝑝) at pixel 𝑝.
– 𝐸 𝑁: measures the consistency between the estimated
depth and the predicted surface normal 𝑁(𝑝).
– 𝐸𝑆: encourages adjacent pixels to have the same depths.
– 𝐵 ∈ [0, 1] downweights the normal terms based on the
predicted probability a pixel is on an occlusion boundary
(𝐵(𝑝)).
• the matrix form of the system of equations is sparse and
symmetric positive definite, we can solve it efficiently with a
sparse Cholesky factorization
※this method is proposed in “Deep depth completion of a single
rgb-d image.” (CVPR 2018)
16
Results
• Results on real-world images and novel object shapes.
• Robot manipulation
17
Experiment Setting
• Dataset Notation
– Syn-train: Synthetic training set with 5 objects
– Syn-known: Synthetic validation set for training objects.
– Syn-novel: Synthetic test set of 4 novel objects.
– MP+SN: Out-of-domain real-world RGB-D datasets of indoor scenes that do not contain
transparent objects’ depth (Matterport3D [7] and ScanNet [11]).
– Real-known: Real-world test set for all 5 of the training objects.
– Real-novel: Real world test set of 5 novel objects, including 3 not present in synthetic data.
• Metrics
– RMSE: the Root Mean Squared Error in meters.
– Rel: the median error relative to the depth
– percentages of pixels with predicted depths falling within an interval ([δ = |predicted −
true|/true], where δ is 1.05, 1.10 or 1.25)
18
Results on real-world images and novel object shapes (quantitive results)
• Generalization: real-world images
– achieved similar RMSE and Rel scores on real-world domain.
• Generalization: novel-object shapes
– able to generalize to previously unseen object shapes.
19
Results on real-world images and novel object shapes
• qualitative results.
20
Robot manipulation
• Enviroment Setting
– a pile of 3 to 5 transparent objects are presented on a table.
– suction and a parallel-jaw gripper are tested as end-effectors.
– For each end-effector type, with and without Clear-Grasp, we train a grasping algorithm using
500 trial and error grasping attempts, then test it with 50 attempts.
– picking algorithm is same as “Robotic Pick-and-Place of Novel Objects in Clutter with Multi-
Affordance Grasping and Cross-Domain Image Matching”(ICRA2017)
• Metrics
– success rate = # successful picks / #picking attemtps
• Results
end-effectors wo Clear-Grasp w Clear-Grasp
suction 64% 86%
parallel-jaw 12% 72%
21
Picking algorithm
• FCN infer pixel-wise suction or grasp success probability from rotated heightmaps generated
from RGB-D images.
22
Conclusion
• ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation recovered 3D
geometry of transparent objects by
– Created synthetic and real 3D geometry dataset of transparent objects.
– Propose an architecture to infer accurate 3D geometry of transparent objects from a single
RGB-D image.
• We can utilize these ideas in our research especially in
– sim2real computer vision.
– research or development which use depth camera.
23
Refrences
• ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation
– https://arxiv.org/abs/1910.02550
• Soccer On Your Tabletop
– http://grail.cs.washington.edu/projects/soccer/
• Semantic Scene Completion from a Single Depth Image
– https://arxiv.org/pdf/1611.08974.pdf
• TOM-Net: Learning Transparent Object Matting from a Single Image
– http://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_TOM-
Net_Learning_Transparent_CVPR_2018_paper.pdf
• Material-Based Segmentation of Objects
– https://cseweb.ucsd.edu/~mkchandraker/pdf/wacv19_transparent.pdf
• Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks
– http://people.compute.dtu.dk/jerf/papers/matseg.pdf
• A Geodesic Active Contour Framework for Finding Glass
– http://isda.ncsa.illinois.edu/~kmchenry/documents/cvpr06a.pdf
24
Refrences
• Friend or foe: exploiting sensor failures for transparent object localization and classification.
– https://www.uni-koblenz.de/~agas/Public/Seib2017FOF.pdf
• Glass Object Localization by Joint Inference of Boundary and Depth
– https://xmhe.bitbucket.io/papers/icpr12.pdf
• A Fixed Viewpoint Approach for Dense Reconstruction of Transparent Objects
– https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/2B_072.pdf
• Seeing Glassware: from Edge Detection to Pose Estimation and Shape Recovery
– http://www.roboticsproceedings.org/rss12/p21.pdf
• Learning Dexterous In-Hand Manipulation
– https://arxiv.org/abs/1808.00177
• Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
– https://arxiv.org/abs/1703.06907

More Related Content

What's hot

150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
Junho Cho
 

What's hot (20)

Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Object Pose Estimation
Object Pose EstimationObject Pose Estimation
Object Pose Estimation
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Survey 1 (project overview)
Survey 1 (project overview)Survey 1 (project overview)
Survey 1 (project overview)
 
Object detection
Object detectionObject detection
Object detection
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Automatic Building detection for satellite Images using IGV and DSM
Automatic Building detection for satellite Images using IGV and DSMAutomatic Building detection for satellite Images using IGV and DSM
Automatic Building detection for satellite Images using IGV and DSM
 
PCL (Point Cloud Library)
PCL (Point Cloud Library)PCL (Point Cloud Library)
PCL (Point Cloud Library)
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet I
 
Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)
Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)
Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)
 

Similar to [DL輪読会]ClearGrasp

Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
Daniel Cahall
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
mokamojah
 

Similar to [DL輪読会]ClearGrasp (20)

3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
final_presentation
final_presentationfinal_presentation
final_presentation
 
Large Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfLarge Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdf
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
ei2106-submit-opt-415
ei2106-submit-opt-415ei2106-submit-opt-415
ei2106-submit-opt-415
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
 
3D Image visualization
3D Image visualization3D Image visualization
3D Image visualization
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
cvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptxcvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptx
 
ppt 20BET1024.pptx
ppt 20BET1024.pptxppt 20BET1024.pptx
ppt 20BET1024.pptx
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and Python
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
 

More from Deep Learning JP

More from Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

[DL輪読会]ClearGrasp

  • 2. 2 Introduction • Recognizing 3D geometry of objects are important to automate human operation. ex. Grasping • However recognizing 3D geometry of transparent objects are difficult. – Because of the specular highlights or reflecting back from the surface behind the object. • ClearGrasp proposed the method to recognize 3D geometry of transparent objects from a single RGB-D Image. – By using Sim2Real techniques and proposing the CNN architecture for the task.
  • 3. 3 Supplement: Errors in depth estimation for transparent objects Transparent objects occur erros in depth estimation from RGB-D camera. – Type I: erros are caused by specular highlights. – Type II: erros are caused by reflecting back from the surface behind the object.
  • 4. 4 Sim2Real: Learning from synthetic data. • To train NN for image recognition, we need a lot of images and labels, which is laborious, costly and time consuming. • Image synthetic techniques can generate a lot of images with labels. synthetic data real data
  • 5. 5 Related work of Sim2Real • Synthetic data was used in various tasks, but the research which concern transparent objects is few. • Transparent-included dataset wasn’t used for 3D reconstruction.
  • 6. 6 Synthetic data was used in various tasks, but the research which concern transparent objects is few. Synthetic dataset was used for control humanoid hand. Learning Dexterous In-Hand Manipulation Synthetic dataset was used for grasping objects. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
  • 7. 7 Transparent-included dataset wasn’t used for 3D reconstruction. Synthetic dataset which contain transparent object was used for refractive flow estimation, semantic segmentation, or relative depth. reflective flow estimation TOM-Net: Learning Transparent Object Matting from a Single Image semantic segmentation Material-Based Segmentation of Objects relative path Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks
  • 8. 8 Paper Information • Title: ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation • Authors: Shreeyak S. Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Lee, Andy Zeng, Shuran Song • Institutes: Synthesis.ai, Google, Columbia University. • Research Page: https://sites.google.com/view/cleargrasp • Dataset: https://sites.google.com/view/cleargrasp/data?authuser=0 • Code: https://github.com/Shreeyak/cleargrasp • Publication: ICRA 2020 • Blog: https://ai.googleblog.com/2020/02/learning-to-see-transparent-objects.html
  • 10. 10 Abstract • Created synthetic and real 3D geometry dataset of transparent objects. • Propose an architecture to infer accurate 3D geometry of transparent objects from a single RGB- D image.
  • 11. 11 ClearGrasp dataset: Synthetic dataset • 50,000 photorealistic renders. • Include surface normals, segmentation masks edges, and depth. • Each image contains up to 5 transparent objects. • Transparent objects are on a flat ground or inside a tote with various backgrounds and lighting. • Synthetic Images are generated from Blender physics engine and rendering engine.
  • 12. 12 ClearGrasp dataset: Real dataset • Real dataset – 286 images. – Include RGB-Images and Depth Images. – Each image contains up to 6 transparent objects with an average of 2 objects per image. – Transparent objects were placed in the scene along with various random opaque objects like cardboard boxes, decorative mantelpieces and fruits. – First they put spray painted objects same as transparent ones to get depth and replaced using GUI app to put sub-millimeter accuracy can be achieved in the positioning of the objects.
  • 13. 13 Overview of proposed CNN architecture
  • 14. 14 Method: RGB Image→surface normals/mask/occulusion boundaries • Transparent object segmentation – outputs: the pixel-wise masks of transparent objects. • Surface Normal estimation – output: surface normal (3 dims), L2 normalized • Boundary detection – output: each pixel labels of the input images(Non-Edge/Occlusion Boundary/Contact Edges) ※ all network architectures are DeepLabv3 + DRN-D-54
  • 15. 15 Method: Global optimization • loss function is written in right. • notation – 𝐸 𝐷: the distance between the estimated depth 𝐷(𝑝) and the observed raw depth 𝐷0(𝑝) at pixel 𝑝. – 𝐸 𝑁: measures the consistency between the estimated depth and the predicted surface normal 𝑁(𝑝). – 𝐸𝑆: encourages adjacent pixels to have the same depths. – 𝐵 ∈ [0, 1] downweights the normal terms based on the predicted probability a pixel is on an occlusion boundary (𝐵(𝑝)). • the matrix form of the system of equations is sparse and symmetric positive definite, we can solve it efficiently with a sparse Cholesky factorization ※this method is proposed in “Deep depth completion of a single rgb-d image.” (CVPR 2018)
  • 16. 16 Results • Results on real-world images and novel object shapes. • Robot manipulation
  • 17. 17 Experiment Setting • Dataset Notation – Syn-train: Synthetic training set with 5 objects – Syn-known: Synthetic validation set for training objects. – Syn-novel: Synthetic test set of 4 novel objects. – MP+SN: Out-of-domain real-world RGB-D datasets of indoor scenes that do not contain transparent objects’ depth (Matterport3D [7] and ScanNet [11]). – Real-known: Real-world test set for all 5 of the training objects. – Real-novel: Real world test set of 5 novel objects, including 3 not present in synthetic data. • Metrics – RMSE: the Root Mean Squared Error in meters. – Rel: the median error relative to the depth – percentages of pixels with predicted depths falling within an interval ([δ = |predicted − true|/true], where δ is 1.05, 1.10 or 1.25)
  • 18. 18 Results on real-world images and novel object shapes (quantitive results) • Generalization: real-world images – achieved similar RMSE and Rel scores on real-world domain. • Generalization: novel-object shapes – able to generalize to previously unseen object shapes.
  • 19. 19 Results on real-world images and novel object shapes • qualitative results.
  • 20. 20 Robot manipulation • Enviroment Setting – a pile of 3 to 5 transparent objects are presented on a table. – suction and a parallel-jaw gripper are tested as end-effectors. – For each end-effector type, with and without Clear-Grasp, we train a grasping algorithm using 500 trial and error grasping attempts, then test it with 50 attempts. – picking algorithm is same as “Robotic Pick-and-Place of Novel Objects in Clutter with Multi- Affordance Grasping and Cross-Domain Image Matching”(ICRA2017) • Metrics – success rate = # successful picks / #picking attemtps • Results end-effectors wo Clear-Grasp w Clear-Grasp suction 64% 86% parallel-jaw 12% 72%
  • 21. 21 Picking algorithm • FCN infer pixel-wise suction or grasp success probability from rotated heightmaps generated from RGB-D images.
  • 22. 22 Conclusion • ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation recovered 3D geometry of transparent objects by – Created synthetic and real 3D geometry dataset of transparent objects. – Propose an architecture to infer accurate 3D geometry of transparent objects from a single RGB-D image. • We can utilize these ideas in our research especially in – sim2real computer vision. – research or development which use depth camera.
  • 23. 23 Refrences • ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation – https://arxiv.org/abs/1910.02550 • Soccer On Your Tabletop – http://grail.cs.washington.edu/projects/soccer/ • Semantic Scene Completion from a Single Depth Image – https://arxiv.org/pdf/1611.08974.pdf • TOM-Net: Learning Transparent Object Matting from a Single Image – http://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_TOM- Net_Learning_Transparent_CVPR_2018_paper.pdf • Material-Based Segmentation of Objects – https://cseweb.ucsd.edu/~mkchandraker/pdf/wacv19_transparent.pdf • Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks – http://people.compute.dtu.dk/jerf/papers/matseg.pdf • A Geodesic Active Contour Framework for Finding Glass – http://isda.ncsa.illinois.edu/~kmchenry/documents/cvpr06a.pdf
  • 24. 24 Refrences • Friend or foe: exploiting sensor failures for transparent object localization and classification. – https://www.uni-koblenz.de/~agas/Public/Seib2017FOF.pdf • Glass Object Localization by Joint Inference of Boundary and Depth – https://xmhe.bitbucket.io/papers/icpr12.pdf • A Fixed Viewpoint Approach for Dense Reconstruction of Transparent Objects – https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/2B_072.pdf • Seeing Glassware: from Edge Detection to Pose Estimation and Shape Recovery – http://www.roboticsproceedings.org/rss12/p21.pdf • Learning Dexterous In-Hand Manipulation – https://arxiv.org/abs/1808.00177 • Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World – https://arxiv.org/abs/1703.06907

Editor's Notes

  1. http://grail.cs.washington.edu/projects/soccer/ https://arxiv.org/pdf/1611.08974.pdf
  2. http://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_TOM-Net_Learning_Transparent_CVPR_2018_paper.pdf https://cseweb.ucsd.edu/~mkchandraker/pdf/wacv19_transparent.pdf http://people.compute.dtu.dk/jerf/papers/matseg.pdf
  3. loss function
  4. ・透明物体の3次元形状復元 ニーズ: 把持 シーズ: sim2real, NN architecture 透明物体の3次元形状認識は重要である。 ガラスやプラスチックなどの物体が透明なことがある。 透明物体は深度を取得できないので、3次元形状認識は難しい。 提案手法は、以下の技術を用いて透明物体の認識を実現した 合成画像によるデータセット作成 & sim2real NNアーキテクチャ