2. 2
Introduction
• Recognizing 3D geometry of objects are important to automate human operation. ex. Grasping
• However recognizing 3D geometry of transparent objects are difficult.
– Because of the specular highlights or reflecting back from the surface behind the object.
• ClearGrasp proposed the method to recognize 3D geometry of transparent objects from a single
RGB-D Image.
– By using Sim2Real techniques and proposing the CNN architecture for the task.
3. 3
Supplement: Errors in depth estimation for transparent objects
Transparent objects occur erros in depth estimation from RGB-D camera.
– Type I: erros are caused by specular highlights.
– Type II: erros are caused by reflecting back from the surface behind the object.
4. 4
Sim2Real: Learning from synthetic data.
• To train NN for image recognition, we need a lot of images and labels, which is laborious, costly
and time consuming.
• Image synthetic techniques can generate a lot of images with labels.
synthetic data
real data
5. 5
Related work of Sim2Real
• Synthetic data was used in various tasks, but the research which concern transparent objects is
few.
• Transparent-included dataset wasn’t used for 3D reconstruction.
6. 6
Synthetic data was used in various tasks, but the research which concern
transparent objects is few.
Synthetic dataset was used for control humanoid hand.
Learning Dexterous In-Hand Manipulation
Synthetic dataset was used for grasping objects.
Domain Randomization for Transferring
Deep Neural Networks from Simulation to
the Real World
7. 7
Transparent-included dataset wasn’t used for 3D reconstruction.
Synthetic dataset which contain transparent object was used for refractive flow estimation, semantic
segmentation, or relative depth.
reflective flow estimation
TOM-Net: Learning Transparent
Object Matting from a Single Image
semantic segmentation
Material-Based Segmentation of
Objects
relative path
Single-Shot Analysis of Refractive Shape Using
Convolutional Neural Networks
8. 8
Paper Information
• Title: ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation
• Authors: Shreeyak S. Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Lee, Andy Zeng,
Shuran Song
• Institutes: Synthesis.ai, Google, Columbia University.
• Research Page: https://sites.google.com/view/cleargrasp
• Dataset: https://sites.google.com/view/cleargrasp/data?authuser=0
• Code: https://github.com/Shreeyak/cleargrasp
• Publication: ICRA 2020
• Blog: https://ai.googleblog.com/2020/02/learning-to-see-transparent-objects.html
10. 10
Abstract
• Created synthetic and real 3D geometry dataset of transparent objects.
• Propose an architecture to infer accurate 3D geometry of transparent objects from a single RGB-
D image.
11. 11
ClearGrasp dataset: Synthetic dataset
• 50,000 photorealistic renders.
• Include surface normals, segmentation masks edges, and depth.
• Each image contains up to 5 transparent objects.
• Transparent objects are on a flat ground or inside a tote with various backgrounds and lighting.
• Synthetic Images are generated from Blender physics engine and rendering engine.
12. 12
ClearGrasp dataset: Real dataset
• Real dataset
– 286 images.
– Include RGB-Images and Depth Images.
– Each image contains up to 6 transparent objects with an average of 2 objects per image.
– Transparent objects were placed in the scene along with various random opaque objects like
cardboard boxes, decorative mantelpieces and fruits.
– First they put spray painted objects same as transparent ones to get depth and replaced using
GUI app to put sub-millimeter accuracy can be achieved in the positioning of the objects.
14. 14
Method: RGB Image→surface normals/mask/occulusion boundaries
• Transparent object segmentation
– outputs: the pixel-wise masks of transparent objects.
• Surface Normal estimation
– output: surface normal (3 dims), L2 normalized
• Boundary detection
– output: each pixel labels of the input images(Non-Edge/Occlusion Boundary/Contact Edges)
※ all network architectures are DeepLabv3 + DRN-D-54
15. 15
Method: Global optimization
• loss function is written in right.
• notation
– 𝐸 𝐷: the distance between the estimated depth 𝐷(𝑝) and the
observed raw depth 𝐷0(𝑝) at pixel 𝑝.
– 𝐸 𝑁: measures the consistency between the estimated
depth and the predicted surface normal 𝑁(𝑝).
– 𝐸𝑆: encourages adjacent pixels to have the same depths.
– 𝐵 ∈ [0, 1] downweights the normal terms based on the
predicted probability a pixel is on an occlusion boundary
(𝐵(𝑝)).
• the matrix form of the system of equations is sparse and
symmetric positive definite, we can solve it efficiently with a
sparse Cholesky factorization
※this method is proposed in “Deep depth completion of a single
rgb-d image.” (CVPR 2018)
17. 17
Experiment Setting
• Dataset Notation
– Syn-train: Synthetic training set with 5 objects
– Syn-known: Synthetic validation set for training objects.
– Syn-novel: Synthetic test set of 4 novel objects.
– MP+SN: Out-of-domain real-world RGB-D datasets of indoor scenes that do not contain
transparent objects’ depth (Matterport3D [7] and ScanNet [11]).
– Real-known: Real-world test set for all 5 of the training objects.
– Real-novel: Real world test set of 5 novel objects, including 3 not present in synthetic data.
• Metrics
– RMSE: the Root Mean Squared Error in meters.
– Rel: the median error relative to the depth
– percentages of pixels with predicted depths falling within an interval ([δ = |predicted −
true|/true], where δ is 1.05, 1.10 or 1.25)
18. 18
Results on real-world images and novel object shapes (quantitive results)
• Generalization: real-world images
– achieved similar RMSE and Rel scores on real-world domain.
• Generalization: novel-object shapes
– able to generalize to previously unseen object shapes.
20. 20
Robot manipulation
• Enviroment Setting
– a pile of 3 to 5 transparent objects are presented on a table.
– suction and a parallel-jaw gripper are tested as end-effectors.
– For each end-effector type, with and without Clear-Grasp, we train a grasping algorithm using
500 trial and error grasping attempts, then test it with 50 attempts.
– picking algorithm is same as “Robotic Pick-and-Place of Novel Objects in Clutter with Multi-
Affordance Grasping and Cross-Domain Image Matching”(ICRA2017)
• Metrics
– success rate = # successful picks / #picking attemtps
• Results
end-effectors wo Clear-Grasp w Clear-Grasp
suction 64% 86%
parallel-jaw 12% 72%
21. 21
Picking algorithm
• FCN infer pixel-wise suction or grasp success probability from rotated heightmaps generated
from RGB-D images.
22. 22
Conclusion
• ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation recovered 3D
geometry of transparent objects by
– Created synthetic and real 3D geometry dataset of transparent objects.
– Propose an architecture to infer accurate 3D geometry of transparent objects from a single
RGB-D image.
• We can utilize these ideas in our research especially in
– sim2real computer vision.
– research or development which use depth camera.
23. 23
Refrences
• ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation
– https://arxiv.org/abs/1910.02550
• Soccer On Your Tabletop
– http://grail.cs.washington.edu/projects/soccer/
• Semantic Scene Completion from a Single Depth Image
– https://arxiv.org/pdf/1611.08974.pdf
• TOM-Net: Learning Transparent Object Matting from a Single Image
– http://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_TOM-
Net_Learning_Transparent_CVPR_2018_paper.pdf
• Material-Based Segmentation of Objects
– https://cseweb.ucsd.edu/~mkchandraker/pdf/wacv19_transparent.pdf
• Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks
– http://people.compute.dtu.dk/jerf/papers/matseg.pdf
• A Geodesic Active Contour Framework for Finding Glass
– http://isda.ncsa.illinois.edu/~kmchenry/documents/cvpr06a.pdf
24. 24
Refrences
• Friend or foe: exploiting sensor failures for transparent object localization and classification.
– https://www.uni-koblenz.de/~agas/Public/Seib2017FOF.pdf
• Glass Object Localization by Joint Inference of Boundary and Depth
– https://xmhe.bitbucket.io/papers/icpr12.pdf
• A Fixed Viewpoint Approach for Dense Reconstruction of Transparent Objects
– https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/2B_072.pdf
• Seeing Glassware: from Edge Detection to Pose Estimation and Shape Recovery
– http://www.roboticsproceedings.org/rss12/p21.pdf
• Learning Dexterous In-Hand Manipulation
– https://arxiv.org/abs/1808.00177
• Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
– https://arxiv.org/abs/1703.06907