Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Differentiable Ray Sampling for Neural 3D Representation

6,829 views

Published on

Differentiable Ray Sampling for Neural 3D Representation.
PFN summer internship 2019 by Naoharu Shimada.

Published in: Technology
  • Be the first to comment

Differentiable Ray Sampling for Neural 3D Representation

  1. 1. N. H. Shimada Differentiable Ray Sampling 
 for Neural 3D Representation
 Preferred Networks 2019 Research Internship
  2. 2. Single-view 3D reconstruction ・Grasping ・Autonomous driving [Yan+ ICRA 2018] [Mapillary blog]
  3. 3. Single-view 3D reconstruction ● 3D supervision ○ A large number of 3D datas are needed. [Kato+ CVPR 2019] Input (image) Output (3D geometry) prediction model
  4. 4. Single-view 3D reconstruction ● 2D supervision ○ End-to-end training: only 2D images. ○ Differentiable renderer is needed. [Kato+ CVPR 2019] Input (image) prediction model Rendering 3D geometry Output (image)
  5. 5. Single-view 3D reconstruction ● 3D Geometry representation 1. [Kato+ CVPR 2017] 2. [Tulsiani+ CVPR 2018] 3. [Sitzmann+ arXiv 2019] Mesh1 Voxel2 Neural 3D (SRN3 ) Neural 3D (Ours) initial shape ✕ ◯ ◯ ◯ memory vs resolution ◯ ✕ ◯ ◯ the number of train views ◯ ◯ (✕) ◯ Accuracy (IoU) 0.71 0.73 - ???
  6. 6. DRC (Tulsiani+ CVPR 2017) Encoder Decoder Input (image) 323 voxel (occupancy) Rendered image
  7. 7. DRC (Tulsiani+ CVPR 2017) ● Differentiable rendering
  8. 8. DRC (Tulsiani+ CVPR 2017) Input (RGB) Input (RGB) Ground truth Prediction Prediction
  9. 9. Ours Voxel grid representation as function : (xi , yi , zi ) → (Occupancy) 323 discrete input Memory increases cubically with higher resolution DRC (Tulsiani+ CVPR 2017) Our idea x y z Occupancy Neural 3D representation :
 (x, y, z) → (Occupancy) Continuous input Constant memory with high resolution
  10. 10. Ours ● Differentiable ray sampling d
 Translation probability Pixel value in mask images 0 1
  11. 11. Ours Encoder Decoder Input (image) Rendered image parameters x y z 3D Networks
  12. 12. Results ● 1 instance Ground truth Prediction Diff IoU (DRC) 0.53 (0.43) Voxelized 3D (sliced image) {prediction, gt, diff} 0.81 (0.73) Car Chair
  13. 13. Results ● Multi-instance (Qualitative) Ground truth Prediction Diff Input RGB Car Chair
  14. 14. Results ● Multi-instance (Quantitative) Accuracy (IoU) Voxel (DRC1 ) Neural 3D (Ours) Car 0.73 0.72 Chair 0.43 0.44
  15. 15. Results ● Multi-instance (Loss plots) Car Chair
  16. 16. SRN (Sitzmann+ NIPS 2019) Encoder Decoder Input (image) Rendered image parameters x y z 3D Networks pixel generator SDF (?) di d1 d2 d0 The part of rendering is also a networks. → 50 images per 1 object for training

×